The Joy of Search: A Review Over Time

This piece began as a book review, but in thinking about Google Search, it quickly became a reflection on change witnessed in the online search environment. Ten years ago, I wrote an article for the NFAIS membership discussing a particular Google search experience. My husband and I had visited a museum exhibit on book illustration while on vacation and had examined a particular woodcut from the 1898 edition of Tennyson’s Idylls of the King: Vivien, Elaine, Enid, Guinevere. Unfamiliar with the artist, upon returning home, I ran a Google search. More accurately, I ran a slew of Google searches. In framing my initial query, I had only the minimal metadata provided on the museum plaque (artist’s name, the publication title -- given only as Idylls -- and the date). Insufficient. A second query [ Rhead illustrator Tennyson ] also failed to elicit from Google the correct work.

Granted that a three-word query isn’t much to go by, the limits of a Google search in 2009 were fairly evident. As one example, Google's system was incapable of recognizing that two of those three words constituted proper names. In parsing my search for processing, it consistently replaced the proper name of Rhead with alternate familiar terms -- either read or head.  So I ended up formulating and tweaking an extended series of searches to order to complete my information task. 

At the end of two hours, I had satisfied my curiosity. I had acquired a basic understanding of the artist’s life, a sense of his work in historical context, viewed six to eight posters and periodical covers and, after deliberately inputting incorrect metadata, located a downloadable PDF in Google Books of the scanned text of Tennyson with all sixty of Rhead’s illustrations. My subsequent write-up still expressed discontent. Given the broad scope of content at Google’s command in 2009 as well as what it knew about me as a user at that point in time, the research process had taken more time and been more laborious than I’d have thought necessary.

Where was the system’s intelligence? I was logged in under a long-standing Google identity with an associated search history. I was using consecutive iterations of a search query across a single session. Why didn’t Google leverage that in generating useful results? The indexing power of its algorithms produced spidered results from various sites (Wikipedia, Artcyclopedia.com, Flickr, its own Google Books), but Google did not (perhaps truly could not) piece together two hours’ worth of search activity and understand the nature of my information requirement.  On August 28, 2009, Jill O’Neill is searching for information about 19th century American book illustrator and graphic artist, Louis John Rhead, and related concepts. Based on queries and click-through behavior during the current search session, she appears to have an interest in viewing examples of his artistic output. Based on her previous Web history with us (Google), result sets should predominantly be from high-quality, high-ranking informational sites (.edu or .org), with college reading level preferred.

My intent in writing that piece was to show NFAIS members that the Google search experience continued to be rife with vulnerabilities, vulnerabilities that serious researchers need to avoid. Three years later, again in an issue of NFAIS Enotes, I revisited that search experience and evaluated Google’s performance. In 2012, CNET analysts were themselves grumbling that Google was failing at delivering useful search results.

Fast forward to 2019. In mid-June, Barbara Fister wrote a brief essay on Inside Higher Ed noting the vulnerabilities of discovery in libraries – whether via the card catalog or via the use of Google. Specifically she wrote “When libraries got jealous of Google’s popularity, we thought we could reconstruct its simplicity, but we don’t have Google’s engineers and the kinds of information we offer is not easily put on a Google Knowledge Graph Card...But, to be honest, library systems of cataloging and classification have never been great when it comes to seeing connections and building knowledge through understanding context. Maybe Google has created the false impression that finding answers is easy...” 

All of this came together for me as I sat in a session during ALA. I had the happy experience of sitting down in the wrong session but serendipitously enjoying a worthwhile talk from a search expert. Daniel M. Russell holds the title of Senior Research Scientist for Search Quality at Google. For roughly 13 years, he has actively engaged in educating users in appropriate use of various Google search tools. (One of his recent articles appeared in Scientific American.) His appearance at ALA was to generate interest in his forthcoming book, The Joy of Search (MIT Press, September 2019).  To respond to Fister’s statement, Russell is well aware that finding answers is not easy and his talk (as well as his book) reflect that awareness. 

After reading an ARC of Russell’s book, kindly provided by MIT Press, I was reminded of the complexities of online search. Russell’s talk was a return to the concept of GIGO or “Garbage in, Garbage Out”. If you don’t train users about the thought required in framing a research question, about how to craft a search query or document results derived from that query, they will not do well.

Russell’s book is a useful text, focusing on helping users to develop the skills to think strategically about solving an information problem. The book consists of 20 chapters, each of which offers examples of complex information-seeking tasks and subsequently lays out the thought and search processes used in satisfying the inquiry. As examples, review the kinds of questions that form the primary headings of these three chapters:

  • Chapter 11: Can You Die from Apoplexy or Rose Catarrh?
  • Chapter 14: What’s the Connection between “The Star Spangled Banner” and the General Who Burned the White House?
  • Chapter 16: Is Abyssinia the Same as Eritrea?

The overarching point is that while Web resources make available amazing depth of content for the information seeker, one must realize that Google (or any other search tool) will not always provide a nicely pre-packaged answer. The individual may need to formulate a new search strategy or even a series of strategies in order to satisfy a particular need. Russell is clear about this in his book, although the following quote doesn’t appear until the closing pages: “…search engines don’t signal that they lack the knowledge to supply an answer, yet they don’t want to look bad so they give a Web-search set of results instead. That’s a great fallback position, but it’s also an important difference between an answer and a set of search results.”

Reinforcing that awareness is even more important when one considers the rapid expansion (just in terms of volume) of openly accessible content on the Web. In his talk at ALA, Russell noted that during every minute of the day, more than 400 hours of video content gets uploaded to YouTube. There are over a billion viewings of learning-related videos daily. He noted as well the diversity found in content forms, referencing the photography found in Google Street View.

Consider as well issues associated with locating data sets. One useful chapter walks the reader through Google Dataset Search. This is a beta phase tool which contains public data sets. Search for African American Population and you get a wealth of economic data from the St Louis Federal Reserve (see example). Searching for something as specific as “Time spent on reading the Bible in the U.S. from 2013 and 2017” yields reliable results from paywalled site, Statista. On the downside, while Russell notes in later chapters the need for searchers to be aware of coverage and limitations of the resources searched, the FAQ for Google Dataset Search is remarkably vague on both counts. What data has been uploaded there? Only by experimenting with queries would one be able to tell. Questions of scope and comprehensiveness have been a long-standing point of friction between Google representatives and the information profession. This is one thing that has not changed.

Today, I can ask Google “Who was Louis Rhead?” and get a surprisingly intelligent response. Google has sufficient linked data to muster up more than a (potentially incomplete) Wikipedia item, although Wikipedia is still high in the results shown. However, right below the link to Wikipedia is a link to the Smithsonian American Art Museum record on Rhead, a reliably authoritative source of information. That record includes an Open Linked Data URI. Google points to the finding aid on Rhead from the George Smathers Library at the University of Florida, one which includes useful subject and access terms. A fair number of providers across the information spectrum have gotten smarter in what they serve up, feeding chunks of data into the maw of a developing machine intelligence. Google results rely on that collective effort.

To complete information tasks, it is important that users learn how to engage with the systems that surround them. Russell’s acknowledgements in the book express appreciation to anthropologist Mimi Ito for “reminding me that most people think of online research as a pedestrian skill that shouldn’t need any teaching…She pointed out that this book needs to be intrinsically interesting.” That’s an important insight and Russell clearly took it to heart. His book is both interesting and informative. One hopes that some set of college syllabi will include it as an advisable addition to the learning experience.

That said, we’re still a good distance from the fantasy of artificial intelligence that has been fostered by movies and television. Back in the days of the original Star Trek. Captain Kirk would activate his ship’s computer by voice, indicate his information task required material from one of the libraries of historical data, demand the system supply “Information on Kodos the Executioner” and receive the vital clue needed to track down a murderer.  

More pragmatically, I do see intriguing hints about the role of artificial intelligence as applied to search. The Allan Institute for Artificial Intelligence offers Semantic Scholar. Drawing from more than 20 content providers, that system offers effective search mechanisms and a refreshingly open interface. JSTOR has created a prototype search tool for film, using an HBO documentary on Martin Luther King, which supports search for specific persons, issues, and concepts referenced. Also of interest is the Open Knowledge Maps initiative, although being purely oriented towards the sciences, it does no better with the Louis J. Rhead search than Google did 10 years ago. Still, a representative speaking at the Open Science Conference in 2019 begged that audience to not leave search solely in the hands of Google.

For the NISO membership as well as for the information industry as a whole, none of this may be news. Regardless of whether you are a traditional scholarly press or a search engine giant, much of what content and discovery providers contribute to the research process is submerged, algorithms running and sorting “under the hood”. In a research environment deeply focused on furthering the progress of open access and driven by analytics, there is still much to do in support of identifiers, metadata, interoperability, and privacy protections. Is now the time for your organization to become more deeply engaged with the standards development critical to engineering the next-generation, frictionless information experience?  The platforms we create cannot entirely eliminate the need for users to think, but the community could do much from this end to improve their chances.