Wednesday, 8 August 2012

Google Book Search Project

This morning I was reading an interesting blog post by Karen Coyle about the Google Book Search Project (  Apparently Google filed a case that claimed that digitizing a book in order to make it searchable is "fair use" and thus legal in the U.S. as long as proper precautions are taken to ensure that this digitization doesn't essentially make an entire copyrighted book or substantial sections of it free to use and download.

One of the arguments made by Google is that full text searching is far superior to searching using library catalogues.  Karen argues that this is essentially "throwing libraries under the bus".  She says that full text searching should be seen as complimentary to "standards based metadata", not a superior replacement for it.  I agree with her.  When I used to do a lot of reference work I found that there were times when full text searching was critical.  For example, a patron might come into the library and only know a line of a poem or song and want to find the full version.  Granger’s poetry index only indexes certain types of poetry in certain ways......  for some reason, I couldn’t find a sniff of any poem that begins “I can’t get enoughsky of Lizzie Pitofsky”!  A Google search for a line of text was usually successful.  That is, if the patron got the words and spelling at least more or less correct.  However, I can think of many times when the actual cataloguing of materials helped users do a fairly exhaustive search of the public library's collection for materials about a certain topic which would never have been possible or reasonable to do with full text searching alone. 

There is this term that sounds really geeky, disambiguation, which is used to identify a metadata goal which I believe  is key to some of the best stuff that cataloguing in libraries can do for users. Basically, library catalogues help users to tell the difference between a table that is a piece of furniture and the type that you might use in a chemistry or mathematics class; to know the difference between Sydney Australia and Sydney Nova Scotia; or to tell the difference between mechanical and psychological stress.  This essentially is what disambiguation is. I’ve used full text searching in systems such as Dialog where I had to sit down with a piece of paper and figure out, for example, that if I want to find information about chemical tables but not the periodic table, I will have to string together some sort of Boolean search which includes a “not” for the term “periodic” and perhaps use some proximity operators so that I wouldn’t end up with every chemistry publication that happened to have a table of contents.  However, in the library catalogue there is a subject heading “chemistry – tables”.  A bit of typing and a few clicks and I have a nice list of resources.  Not so in Dialog.  So, why wouldn’t I just start with the library’s catalogue?  Makes sense to me that I would.

So, in my experience, standards based metadata that is used in library catalogues is generally effective and efficient for most types of information searches.  However, when a person is really looking for a needle in a haystack and the colour and size of that needle is known, full text searching is a real lifesaver. I do agree that it is most unfortunate that Google has chosen to speak of library catalogues in a way that devalues them to make their case.  However, it is interesting that it is possible that new opportunities for digitization of materials for full text searching could be opening up in the near future.  If you’d like to read the entire case filing, the user is here: