Changes to Google Scholar’s Search algorithm

Thursday, September 4th, 2008

I was in Philadelphia for a meeting yesterday prior to the Society of Scholarly Publishing’s Top Management Roundtable.  I was talking to a colleague about some recent changes that were made to Google Scholar’s search algorithm that were released last week.  Apparently, these changes bring articles that are freely available higher on the search results list than articles which are behind a subscription wall.  This is an interesting change that could create some stir in the community.

I have several thoughts about this change.  First, I wonder who knew or recognized the change when it happened? As a service that many (most?) researchers and students use, the underlying basis for which results are presented is completely unkown.  This has been a common criticism of Google for a long time.  Google’s PageRank algorithm has been the “secret sauce” and among the most highly guarded secrets in a highly secretive company.  Although, according to Scholar’s “About” page, the algorithm is different for Scholar than the rest of Google.  Interestingly, though no one (outside of Google) knows why an article ranks more highly on the list than another everyone seems to rely on it, despite some research that has shown other library search services are more effective.  Many in the community have been critical of this this practically since Scholar was released.

More interesting than the ongoing debate about Google’s openness is the ramifications that this particular change has regarding copyright.  If an article is found in a subscription-walled system and is copied and posted to an open site, according to this change, the pirated copy would appear higher on the search results than the legitimate copy.  The person I was discussing this with saw some examples of content from their site which was posted on open sites.  Obviously, people post content for numerous reasons and some have legitimate rights to do so.  For example, most publishers allow author self-archiving or posting to an individual’s home page.  In this case, it probably is preferable from the author’s perspective to have the freely available copy ranked higher, because it is directed to the author’s site, probably where more information about the author and their work resides.  This is one area where NISO’s Journal Article Versions recommended practice would be usefully applied.

What is likely occurring more often is that authorized users find an article, copy the file and post it outside of the subscription wall.  This might be done knowingly or not, but it is odd that the new changes would drive traffic to files that could well be posted in violation of copyright laws.

Of course, if the algorithm changes again, we might never know.