Monday, 22 February 2016

Metadata Update #28 - Identifiers, 3 years later

Way back in Metadata Update #13 (Feb 14 2013, I spoke briefly about the role and importance of identifiers in online electronic information.  Three years later, it has proven that the talk about identifiers wasn’t a splash in the pan. They were the talk of the town at ALA Midwinter once again.  

As libraries experimented with BIBFRAME and moved from BF 1.0 to BF 2.0 and linked data work moved from the theoretical LD4L to the practical LD4P, certain things about library data and the wider information environment, things that we have “sort of known” for a long time, have gradually started to come into much clearer focus and we are starting to understand what they really mean for the day to day work of creating and managing metadata.  One of those “things” is the importance of identifiers.

Libraries have long made use of the concept of controlled vocabularies where a single word, phrase or form of a name is used to represent a single person, place, thing or event.  Cataloguers and other librarians understand that the use of these vocabularies (controlled headings, name authority data, etc.) assist with collocation and disambiguation.  I often hear discussions about the need for increased disambiguation of terms and persons in online environments.   It seems that as our body of electronic information grows, the more we need to be able to bring together all of the information on a same topic with ease while eliminating the voluminous noise of irrelevant information.  In academic environments, the ability of authors to collocate their work and institutions to do the same for textual and artistic outputs of their faculty and researchers in becoming increasingly important.  Identifiers help with all of this.

The idea behind “identifiers” builds on the older concept of controlled vocabularies.   That thing (identifier) which is used represents person, places, things or events is much more flexible and powerful in our current information environment than was possible in the past.  For the most part identifiers are numeric rather than textual or string representations.  The power they carry is that they can be mapped to multiple scripts and languages and linked to multiple other identifiers so that information seekers can explore topics and see relationships among persons and events in increasingly complex and rich ways.  As VIAF ( has shown, even where there are multiple systems of identifiers, it is possible to map all of the identifiers to support increasingly powerful ways to collocate the works of an author and disambiguate similar or identical names.  This can be clearly seen in some of the better linked data discovery environments.   As I found out at ALA this year not all linked data is good linked data but for those who get it right, your socks can be knocked off. 

So, now we know that identifiers are a good thing and linked data is a good thing but what does it all mean for the average metadata or cataloging librarian?  At ALA it became apparent that librarians are beginning to think about all of our controlled headings for which there is no associated authority data, keywords stored as subject headings and, to a lesser extent, blind references.  We’ve known about these sorts of issues for a long time but the limited complexity of our OPACs and discovery systems have not caused the systems to break down.  However, linked data triplestores can’t be Swiss cheese and if you’re linking to something, the thing that you want to link to needs to exist.  So, what does this really mean – what do we need to face up to in our day to day work?  A lot of librarians are taking the situation to mean that we need to start creating a whole lot of identifiers somehow.  Some are looking at lowering the training threshold to be able to dramatically increase NARs (LC name authority data) production, while others are looking at automated or systematic ways to create ISNI and ORCID identifiers while others still are looking at ways to create local identifiers as placeholders until proper identifiers in the form or NARs or LCSHs (for example) are created.  On the flip side, I’ve also heard that it’s not necessary for every person place, thing, etc have an associated identifier for BIBFRAME or other linked data for libraries to work.  In response to this, I’ve heard other librarians say that these systems may work but they won’t work as well as they could, etc. 

For myself, I know how much work it is to create some NARs that now that RDA coding is required.  However, I look at the results of a new RDA NAR in terms of quality and potential in our more complex information environments and I can see the value of the work.  My mind does remain a little boggled about where we go from here.  I’m taking stab at assuming that it would be a good idea that I start getting more efficient at creating NARs and also try to focus on making more NARs for authors originating my geographical location and those from whom I can collect the information we are now storing in RDA NARs.  In general, thoughts about identifiers and how we can create more of them remain in the back of my mind and I will most certainly be looking for opportunities as they arise.

Given the readership my blog seems to have picked up again in the last couple of months, I’d like to put the question out to my readers as to what they think about identifiers in a linked data environment and what it all means for the work that we are doing today and will need to do in the near future.  As many of us say, now is an interesting time to be a librarian and this is another issue which reinforces that idea.

By the way, I have heard that some of my readers have been binge reading my blog posts!  Thanks for your email and feedback.  I had even forgotten some of what I wrote three or more years ago.  I find it interesting to hear what people have found the most interesting and also to learn how quickly things can date in our field.  It reminds me that I shouldn’t let quite so much time pass between posts.  I hear and appreciate what one of you has said about the overall rate of change in our field and I suffer from the same information overload and overwhelming curiosity about what is up-and-coming.  I have a big backlog of topics I’d like to cover upon which I think I was getting bogged down and ended up not writing much of anything.   I think that I will shift my focus away from that list and onto what is currently being discussed.  I’m thinking that it might be easier for me to keep up if I focus on what happens to have my interest at the moment rather than topics I feel that I “ought” to cover.  Seeing as there is interest in reading the blog, I’ll try to increase my projected output from one every two months or so to twice a month.  We’ll see how that goes.  And, if I don’t keep up with my word, let me know by email again!

By the way, I a number of you have said that the mini-MOOC and cataloging calculator were very valuable.  I'll continue to suggests tools and videos.  This time you might want to have a look at OCLC's "classify" tool which can be a quick way to work out a classification number when you either don't have time to search around ClassWeb or don't even have a subscription.  Check it out:

No comments:

Post a Comment