Monday, 29 February 2016

Metadata Update # 29 - Strings Versus Things

Strings versus things – this is a common debate/discussion in cataloging circles lately.  The idea that a string of text which represents or describes something is more difficult to construct and less versatile relative to an assigned code which can be mapped to represent the “thing” is not a new idea. The reality is that there is a long tradition of “string creation” in the realm of library metadata.  The science of creating metadata for libraries had its origin in a time before the invention of electronic computers.  The earliest metadata was recorded by human beings on paper or in a paper-based location; coded using human language; and read and interpreted directly by the human eye.  Humans read and can readily make sense of words and sentences.  Words and sentences are made up of strings of text.  In order to make sense of the strings of text, they are typically organized in a certain way (e.g. ISBD).   When the MARC standard was developed in the late 1960s, it was built to organize and format strings of text in a container which could be read by computers.  While the Library of Congress could have, for example, decided to convert library metadata to something that was less “string-based” over 40 years ago and the tradition of constructing complex strings could have been gradually shifted toward the use of codes to represent “things”, this transition has, for the most part, not happened. 

There is an irony in holding tight to the “strings” model.  Many librarians and library workers may find “string-based” metadata more user-friendly because the values are expressed in human language.  A person who is not trained in library metadata standards can generally read and make sense of string-based metadata.  Thus many library workers may feel comfortable with and prefer the idea of string-based metadata.  The irony arises from the fact that learning how to create effective strings which can function in todays crowded information environment is a challenging task.   This year at ALA midwinter, I heard a number of speakers discuss the problem with the length of time that it takes for a new librarian to build basic cataloging skills, let alone become an expert.  Most experienced cataloguers will estimate the learning period as lasting from two to three years.  That time estimate assumes that the new librarian has a local expert to instruct and mentor them through the learning process.  In smaller libraries, the reality is that the new librarian may be doing much of the learning on his or her own which makes the learning curve a little steeper.  From my own experience with taking the NACO training and going through the review process, I can attest to the complexity that lies behind learning to do a task which may seem, on the surface, to be little more than data entry.  So, considering that it takes all of the time and effort to learn how to create string-based metadata, one might assume that the quality and utility of the metadata would be superior to other forms of metadata.  While the quality is likely to be equal, many may find it initially surprising that the code based representations of “things” is actually much more useful and versatile.  The reality is that once a code is assigned to a person, place, thing or concept (etc), that code can be mapped to multiple languages and scripts. The ability to map the codes in this matter assists in addressing a new discovery environment which is not only diverse locally but needs to address the needs of an increasingly global audience.  Thus by assigning a code once and for all discovery environments, the metadata creator can work in his or her native language to create highly flexible, internationalized metadata. Perhaps libraries’ recent experiments with BIBFRAME and other forms of linked data are starting to bring this reality into focus.

To suggest that all “string” based metadata is of the past and will disappear soon and all new metadata will be the code based representations of “things” creates a false dichotomy.  We are not likely to ever entirely get rid of all string-based metadata.  In fact, we need metadata which makes the links between the representative codes and their equivalent text strings in various languages and scripts.  In our current environment, we are gradually moving into a hybrid situation.  Our “string-based” MARC records are now being enriched by OCLC which is adding URIs in the $0 fields.  Those URIs are the new code equivalent for the “thing” which is expressed by the string.  While is it hard to imagine a form of library metadata or data which is not readable in the record format to which we are accustomed, we now know that the vast majority of academic library discovery metadata will take the form of linked data which has no “record structure”.  The recent addition of the $0 subfield to MARC records is our first steps into the new world of library data.  It is a sign that the transition is real and that it has begun.  I certainly will be interested to see, first of all, how the major transition which needs to occur will unfold and, secondly, how we can improve our discovery environments by making the changes.  I’m sure that in the upcoming months and years this blog will revisit the issue of “strings versus things” from time to time. 

Now to round off this post, I will introduce a tool of interest.  I this case it’s not so much a tool but an interesting experiment that OCLC is working on to demonstrate how authority data and linked data can be used to create a useful discovery environment.  This experiment is called WorldCat identities and can be found at: .  To see the results of the experiment in linking up various sources of linked data, users can just click one of the top 100 names that are listed on the main page.  Or, you can also search for the name of a person.  When I demonstrate this site to others, I often pick Justin Trudeau seeing as he has both created resources (i.e. writing, interviews, theatrical performances, etc) and has had works written about him.  It is interesting how the linked data can be brought together to create timelines and also provide links to other persons.  You may also notice with Justin Trudeau that there is some orphaned metadata which forms its own results set and doesn’t seem to link back to the other content.  I think that it’s this sort of problem which indicates to us an area toward which the efforts of cataloguers will need to be redirected in the near future.   From time to time, I check into this website and search the same names to see if known problems have been resolved or if anything new has been added.  I’ve found that it’s not unusual to find new content or relationships displayed.   What hasn’t changed is the problems with what we would call in traditional cataloging, authority control. 

No comments:

Post a Comment