NFAIS Enotes, August 31, 2009

During a museum trip this August, my spouse and I happened upon an exhibit on the Golden Age of American book illustration.  I was particularly taken by a single illustration from an 1898 edition of Tennyson’s Idylls of the King: Vivien, Elaine, Enid, Guinevere. Unfamiliar with the specific artist, upon returning home, I decided to run a quick search to satisfy my curiosity. In framing my Google query, I had only the minimal data provided on the museum plaque (artist’s name, the publication title -- given only as Idylls, and the date).

There was the name of the artist (as it appeared in the museum) but also associated variants – Louis John Rhead (the museum’s preferred designation), Louis J. Rhead, Louis Rhead – as well as the reverse presentation of that name – Rhead, Louis, and Rhead, Louis John. In exploring result sets, I gradually picked up some additional subject headings that I employed in subsequent queries – Art Nouveau, Brandywine Valley Tradition, American Illustration (1895-1945) with yet another variant, American Illustration (1880 – 1914).  Less successfully, due to the nature of the original artwork and its subject matter, I also tried the following:

  • Rhead illustration “Arthurian myth”
  • Rhead illustration “King Arthur”
  • Golden Age “American Illustration”

 By the end of two and a half hours, I had been successful in gaining a basic understanding of the artist’s life, a sense of his work in historical context, viewed six to eight posters and periodical covers, and ultimately, found a downloadable PDF in Google Books of the scanned text of Tennyson with all sixty of Rhead’s illustrations.  Net result?  I had what I wanted, but I came away somewhat discontented. The process was more laborious than I had anticipated.

Do note the search process I’ve described is a casual one in the context of the humanities, but the search elements of creator name, subject headings, and random key words or phrases are common to scientific, technical and medical queries as well. Descriptive terms for the various phases of my information-gathering activity might accurately include more than one of the following: searching, scanning, browsing, assessing.

Those terms are indicative of the behavior discussed in a recent Science article, Strategic Reading, Ontologies, and the Future of Scientific Publishing (Allan H. Renear, et al. Science 324, 828 (2009)0; DOI: 10.1126/science. 1157784). As the abstract of that article indicates “An observed recent increase in strategic reading in the online environment will soon be further intensified by two current trends: (i) the widespread use of digital indexing, retrieval and navigation sources and (ii) the emergence within many scientific disciplines of interoperable ontologies.” The behavior I showed while searching for information about Louis John Read is perhaps a shallow version of strategic reading, but the key elements of searching, scanning, and evaluating are present. Renear’s article makes the point that ontologies are necessary supports to this kind of information seeking behavior in a digital environment.

Those working within the context of search know this. FXPAL blogger Gene Golvchinsky discusses how to improve such exploratory searching at the FXPAL blog; he also discusses the use of MeSH headings in the context of searching.

There was some Web-based buzz surrounding the Science article. Lorcan Dempsey of OCLC quoted two or three sentences from the article, asking what kinds of services “strategic readers” might want to have offered to them. Based on earlier feedback from the academic community, he suggests that a key requirement is a system’s capacity to reveal patterns, relationships, and/or judgments.

The discussion regarding the same article on Friendfeed was less focused. There was a little mild mocking of the article’s conclusion that functionality would either be a part of an application interface or else a browser plug-in. One commenter however suggested that the piece was more about documenting system requirements in advance of policymakers’ gradual movement towards acceptance of Science 2.0 forms of publication. [http://friendfeed.com/the-life-scientists/00520f08/science-reviews-revolution-in-strategic] A similar point was put forward (although not specifically in relation to the Science article) over at a consulting firm’s blog where the blogger underscored the value of taxonomies in navigating oceans of information, saying, “First we have no data. Then we have too much data. Then we find ways to make sense of what we have. Then we need more data. Which means we don't have data. Until we have too much data. And then we'll find ways to make sense of the data we have. And repeat”

Google failed in delivering useful information in response to my strategic reading exercise when the system couldn’t deliver satisfactorily based on the following:

  • Rhead illustration “Arthurian myth”
  • Rhead illustration “King Arthur”
  • Golden Age “American Illustration”

It did however infer some things. From the queries shown above, Google floundered on who or what Rhead was, but it could associate (in some hidden fashion) Howard Pyle with illustration, Thomas Malory with King Arthur, and King Arthur with “Arthurian Legends”. I know this because the system asked me if I cared to search on those options as alternative means to information that might be “close” to the answer I was looking for. These alternatives are dynamically generated.

Through a separate search in Google Books, I eventually was able to dig up the edition of Tennyson that Rhead had illustrated. However, the results Google itself pulled out of Google Books included:

  • King Arthur and the Knights of the Round Table ... - by Thomas Malory, Charles Morris - 266 pages
  • The Story of King Arthur and His Knights - by Howard Pyle - 338 pages
  • The Story of King Arthur - by Tom Crawford, John Green - 95 pages

Even a query such as [Rhead illustrator Tennyson] failed to provoke either Google or Google Books into yielding up the correct work.  Granted that a three word query isn’t much to go by, it was irritating that Google didn’t recognize that two of those three words constituted proper names. It incorrectly kept parsing the name, Rhead, as either read or head. It was  only by going to “advanced search” in Google Books and searching on Louis Rhead as the author that I finally unearthed the book illustration that I had seen in the museum.

It is not news that Google’s use of book metadata is sadly deficient as Geoff Nunberg of the University of Pennsylvania recently pointed outBut, given the broad scope of content at Google’s command as well as what it knows about me as a user, why couldn’t that system do better? I was logged in under a long-standing Google identity, searching for information in a single session and using consecutive iterations of a constructed query. Why doesn’t Google leverage that knowledge insofar as it is able? The indexing power of its algorithms produces the spidered results of various sites (Wikipedia, Artcyclopedia.com, Flickr, its own Google Books), but Google (even in its “personalized” version) did not (and perhaps truly cannot) piece together two hours of consecutive search queries and understand the nature of my information need.  On August 28, 2009, Jill O’Neill is searching for 19th century American book illustrator and graphic artist, Louis John Rhead, and related concepts. Based on queries and click-through behavior during the current 30 minute search session, she appears to have an interest in viewing examples of his artistic output. Based on her previous Web history with us (Google), result sets should be predominantly from high-quality, high-ranking informational sites (.edu or .org) with preferred reading level at or above 8th grade.

It is not that Google lacks the data; it is that Google hasn’t done what Lorcan Dempsey suggested – the leveraging of pattern matching, relationships and judgment – in order to deliver to me a more satisfying information-seeking experience. At least at this point in time, Google doesn’t appear to have much interest in delivering “smart search”.   It still depends on brute retrieval and the notion that determination of relevance resides with the user.

As noted at both recent NFAIS events and in recent issues of these NFAIS Enotes,  Google is paying more attention to developing its broader platform/operating system than to enhancing my user experience. There are good reasons; unresolved issues ranging from privacy to metadata quality to intellectual property concerns are part of the existing gap.  They are interested in handling major challenges that impact on global populations (as made evident in two faculty members’ write up of attendance at the Google Computer Science Faculty summit.  They are focused on machine translation and on working with extremely large data sets in the context of STM.).

They have also been working on the latest version of their Chrome browser, a strategic element in creating a new Web working environment that will surpass Microsoft’s Office Web environment. For the record, Chrome 3.0 is entirely stable and comfortable to use, despite the lack of Firefox-style plug-ins or IE8 enhancements. For more on the current game of one-ups-manship between Google and Microsoft, visit this post

That same Wired article notes that Google is ahead of Microsoft in designing for mobile devices with 18 phones due out this fall, each equipped with Android operating systems. Had I been carrying a smart phone on my trip to the museum, one with access to a 3G or 4G network and an unlimited data plan, I might have done some percentage of that searching on-site at the museum and further enhanced the learning experience. (For more on mobile search interfaces, see this chapter, from Marti Hearst’s forthcoming book, Search User Interfaces (Cambridge University Press, Oct 2009).)  Or, had it been a particularly forward-looking museum, it might have provided me with an augmented reality application that might amplify my on-site experience.  An entry at the Mashable site describes augmented reality apps as applications that “combine virtual data into the physical real world by utilizing the iPhone 3GS or an Android phone’s compass, camera, and GPS system. The result is that you can see things like the location of Twitter users and local restaurants in the physical world, even if they are miles away.” The entry shows six currently available applications. 

NFAIS members will, I hope, take away from this the idea that the Google search experience continues to be full of vulnerabilities, vulnerabilities that serious researchers will want or need to avoid.  Google seems relatively distracted at the moment from that concern.  Is there a way to turn this to the advantage of those NFAIS member organizations who do understand the value of ontologies in the context of a workflow? For those whose systems and data are already equipped to reveal patterns, relationships, and/or judgments?