Metadata Update # 29 - Strings Versus Things
Strings versus things – this is a common debate/discussion
in cataloging circles lately. The idea
that a string of text which represents or describes something is more difficult
to construct and less versatile relative to an assigned code which can be mapped
to represent the “thing” is not a new idea. The reality is that there is a long
tradition of “string creation” in the realm of library metadata. The science of creating metadata for
libraries had its origin in a time before the invention of electronic
computers. The earliest metadata was
recorded by human beings on paper or in a paper-based location; coded using
human language; and read and interpreted directly by the human eye. Humans read and can readily make sense of
words and sentences. Words and sentences
are made up of strings of text. In order
to make sense of the strings of text, they are typically organized in a certain
way (e.g. ISBD). When the MARC standard
was developed in the late 1960s, it was built to organize and format strings of
text in a container which could be read by computers. While the Library of Congress could have, for
example, decided to convert library metadata to something that was less
“string-based” over 40 years ago and the tradition of constructing complex
strings could have been gradually shifted toward the use of codes to represent
“things”, this transition has, for the most part, not happened.
There is an irony in holding tight to the “strings”
model. Many librarians and library
workers may find “string-based” metadata more user-friendly because the values
are expressed in human language. A
person who is not trained in library metadata standards can generally read and
make sense of string-based metadata.
Thus many library workers may feel comfortable with and prefer the idea
of string-based metadata. The irony
arises from the fact that learning how to create effective strings which can
function in todays crowded information environment is a challenging task. This year at ALA midwinter, I heard a number
of speakers discuss the problem with the length of time that it takes for a new
librarian to build basic cataloging skills, let alone become an expert. Most experienced cataloguers will estimate
the learning period as lasting from two to three years. That time estimate assumes that the new
librarian has a local expert to instruct and mentor them through the learning
process. In smaller libraries, the
reality is that the new librarian may be doing much of the learning on his or
her own which makes the learning curve a little steeper. From my own experience with taking the NACO
training and going through the review process, I can attest to the complexity
that lies behind learning to do a task which may seem, on the surface, to be
little more than data entry. So,
considering that it takes all of the time and effort to learn how to create
string-based metadata, one might assume that the quality and utility of the
metadata would be superior to other forms of metadata. While the quality is likely to be equal, many
may find it initially surprising that the code based representations of
“things” is actually much more useful and versatile. The reality is that once a code is assigned
to a person, place, thing or concept (etc), that code can be mapped to multiple
languages and scripts. The ability to map the codes in this matter assists in
addressing a new discovery environment which is not only diverse locally but
needs to address the needs of an increasingly global audience. Thus by assigning a code once and for all
discovery environments, the metadata creator can work in his or her native
language to create highly flexible, internationalized metadata. Perhaps
libraries’ recent experiments with BIBFRAME and other forms of linked data are
starting to bring this reality into focus.
To suggest that all “string” based metadata is of the past
and will disappear soon and all new metadata will be the code based
representations of “things” creates a false dichotomy. We are not likely to ever entirely get rid of
all string-based metadata. In fact, we
need metadata which makes the links between the representative codes and their
equivalent text strings in various languages and scripts. In our current environment, we are gradually
moving into a hybrid situation. Our
“string-based” MARC records are now being enriched by OCLC which is adding URIs
in the $0 fields. Those URIs are the new
code equivalent for the “thing” which is expressed by the string. While is it hard to imagine a form of library
metadata or data which is not readable in the record format to which we are
accustomed, we now know that the vast majority of academic library discovery
metadata will take the form of linked data which has no “record
structure”. The recent addition of the
$0 subfield to MARC records is our first steps into the new world of library
data. It is a sign that the transition
is real and that it has begun. I
certainly will be interested to see, first of all, how the major transition
which needs to occur will unfold and, secondly, how we can improve our
discovery environments by making the changes.
I’m sure that in the upcoming months and years this blog will revisit
the issue of “strings versus things” from time to time.
Now to round off this post, I will introduce a tool of
interest. I this case it’s not so much a
tool but an interesting experiment that OCLC is working on to demonstrate how
authority data and linked data can be used to create a useful discovery
environment. This experiment is called
WorldCat identities and can be found at:
https://www.worldcat.org/identities/
. To see the results of the experiment
in linking up various sources of linked data, users can just click one of the
top 100 names that are listed on the main page.
Or, you can also search for the name of a person. When I demonstrate this site to others, I
often pick Justin Trudeau seeing as he has both created resources (i.e. writing,
interviews, theatrical performances, etc) and has had works written about him. It is interesting how the linked data can be
brought together to create timelines and also provide links to other
persons. You may also notice with Justin
Trudeau that there is some orphaned metadata which forms its own results set
and doesn’t seem to link back to the other content. I think that it’s this sort of problem which
indicates to us an area toward which the efforts of cataloguers will need to be
redirected in the near future. From
time to time, I check into this website and search the same names to see if
known problems have been resolved or if anything new has been added. I’ve found that it’s not unusual to find new
content or relationships displayed.
What hasn’t changed is the problems with what we would call in
traditional cataloging, authority control.
Comments
Post a Comment