Sunday, 13 March 2016

Metadata Update #30 Pollution vs. Perfection - why trying to get the metadata right is a challenge

Wow, time sure goes fast.  No wonder I got behind in writing my posts in the last year or so.  Now I have a bit list of topics which I have intended to write about but never got around to, some of which have since become irrelevant.  So, a few are easy to cross off the list.

There’s one topic which I see has been mentioned in some form more than once on the list.  As I was making an attempt to preview the April updates to the RDA Toolkit, I realized that I had not yet looked at the February 2016 updates.  (for those of you who want to check-out the April changes, they are found here:  http://www.rda-rsc.org/sites/all/files/RDA-changes-2015-proposals.pdf).  As I got over various twinges of frustration at my realization that I remain forever behind in reading the updates, I was reminded of the theme that I had found running across the various possible blog topics I had listed.  This theme was essentially the challenge of keeping up with things in times of change.  It definitely is a challenge which seems to be taking up an increasing chunk of my Sunday evenings!

Earlier this week I heard someone comment that she had been told that cataloguers are overly picky about things that don’t matter.  This is something that I have heard before too. I’ve heard that we obsess and want things to be perfect.  When I’ve heard this sort of comment, I want to laugh at how ridiculous it sounds but then I realise that the person making the comment really doesn’t understand so it would be rude to laugh.  With all of the change that is going on, I’m not sure what perfection is and I certainly don’t think that anyone would purposely fuss over things that don’t matter – not in the last few years.  Sometimes in times of change, it’s a challenge to sort out what matters and what doesn’t – that’s why I often have to go back to the basic models that the new theory and practice is built on.  In reality, I find that most days I try to sort through the models, instructions and guidelines and just do my best.  The idea of attempting to fuss around and attempt perfection sounds like something akin to attempting to set up a tent in the midst of a tornado.   In reality, we’re pulling the lawn chairs in and running for cover! 

Maybe it was true in the past that cataloguers would get some sort of power trip over creating records that perfectly aligned with the old cataloguing rules.  Maybe that level of standardization and “perfection” was irrelevant and laughable – it might be the foundation of a stereotype which seems to still exist today.  In print and siloed electronic environments, the need for standardization wasn’t all that real.  But things have changed a lot in a short period of time.  In the recent work I’ve been doing on our internal metadata flows, I’ve realized that one little code in a control field can cause a malfunction which can impact on other libraries across North America.  Does that sound a little over the top?  Well, that’s a trailer for an upcoming paper I hope to get published later this year.  I have the data to prove it.  Some of those little number and letters really are a big deal and you have to get them right.  What I didn’t fully appreciate until I set out to study what was going on in detail was how extensive of a problem could result from a consistent mistake in records.  The reality is that not only do our systems talk to each other but because of WorldCat and z39.50 our metadata can be scanned and used by external agencies for multiple possible purposes.   Bad metadata is like pollution.  If you put bad things in the air or water, it not only poisons your environment but it poisons the whole earth.  Yeah, it really does.  I’m talking like the David Suzuki of metadata.

We can see that we now live in a brand new world when it comes to information discovery.  The walls between libraries are gradually falling down and our metadata is moving and exchanging in ways we would never have imaged as little as three or four years ago.   In this new environment things do need to be certain ways for the global system to work.  But, how does a person know what to worry about and what to let go?  There certainly has to be some things that can be let go or else we would make ourselves insane!  This is the $64,000 question.  Unlike the contemporary “Who wants to be a millionaire?” there are no “lifelines”.  We have some reasonable ideas about what we need to focus on and what the best choices are but the reality is that time will only tell in terms of where we are on the right track and where we are making mistakes.  The models guide us but the models keep changing because people keep bumping up against their limits and finding places where they don’t work.  I suspect that only contemporary cataloguers who have been following the rapid succession of developments since 2013 would really understand or appreciate my statements in this regard.  Just like the work that I’m doing right now to uncover the results of some coding irregularities, I suspect that sometime in the near future there will be someone who will be combing through my work to massage out the problematic spots and replace them with more enlightened coding.  For the last 3 years I have increasingly come to embrace the idea that living with ambiguity is the name of the game and that part of learning is that we will make mistakes.  One of the cool things that I have been learning is that with all of the new tools we have at hand, it make take some brain power to figure out the problem and an approach to solving it but it may not take very much actual effort to implement the actions to rectify the problem.  Some of the outcomes we can now achieve with ease, we wouldn’t even being to attempt a few years ago.

As libraries move toward the transition away from our record-based metadata containers and toward linked data environments, the need to “get it right” seems more real and more pressing.  If one piece of data is intended to link to another piece of data and that other piece is either missing or wrong, it’s not hard to imagine how the web of links could be broken and our discovery systems could fail.  As we work all of this out, I can see how those from the outside could see that is nitpicking and useless worry about detail.  However, unlike in our former cataloging environments, the need to get things to line-up properly appears to be a requirement and not just the product of obsessions over perfection – if that ever truly was all that common.  It makes me think of the mechanic that might work on your car or the technician who hooks the gas up to your furnace.  I doubt that anyone would criticize the person doing this work as nitpicking or being unnecessarily careful in terms of making sure that all of the correct parts are in the place, that they have been installed correctly and are in good working order.  I suspect that it will take a while for the larger library community to understand the reality that we now experience.  We will need to help people to gradually come to understand it.  It didn’t happen overnight for us, so it’s reasonable to think that it may take even longer for those who have less of a vested interest in understanding it.  However, the comments are a little disheartening in the meantime. 

So, on the topic of linked data, I thought that a good tool for this update would be LC’s Linked Data Tool where you can look up uri’s for the various controlled vocabularies which have already been mapped:  http://id.loc.gov/

As one final comment for this post, I find LC’s comment about the LCC in the search engine to be both interesting and “telling” with regard to the subject of this post:  “LC Classification entries are not included in general search results. You must explicitly select LC Classification in order to search the scheme. This is temporary while the impact of adding LCC to the current system is better understood.”  Wow again.  LC doesn’t know?!  I think that it can provide the basis of a new mantra for cataloguers and metadata specialists “this is temporary until what we need to do is better understood”.  However, in the meantime we need to do our work so we do the best that we can knowing that we likely won’t get it quite right – let alone get it perfect, whatever that might be.