The Memento Project – adding history to the web

Yesterday, I attended the CENDI/FLICC/NFAIS Forum on the Semantic Web: Fact or Myth hosted by the National Archives.  It was a great meeting with an overview of ongoing work, tools and new initiatives.  Hopefully, the slides will be available soon, as there was frequently more information than could be expressed in 20-minute presentations and many listed what are likely useful references for more information.  Once they are available, we’ll link through to them.

During the meeting, I had the opportunity to run into Herbert Van de Sompel, who is at the Los Alamos National Laboratory.  Herbert has had a tremendous impact on the discovery and delivery of electronic information. He played a critical role in creating the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), the Open Archives Initiative Object Reuse & Exchange specifications (OAI-ORE), the OpenURL Framework for Context-Sensitive Services, the SFX linking server, the bX scholarly recommender service, and info URI.

Herbert described his newest project, which has just been released, called the Memento Project. The Memento project proposes a “new idea related to Web Archiving, focusing on the integration of archived resources in regular Web navigation.”  In chatting briefly with Herbert, the system uses a browser plug-in to view the content of a page from a specified date.  It does this by using the underlying content management system change logs to recreate what appeared on a site at a given time.  The team has also developed some server-side Apache code that handles the request for calls to the management of systems that have version control.  The system can also point to a version of the content that exists in the Internet Archive (or other similar archive sites) for content from around that date, if the server is unable to recreate the requested page. Herbert and his team have tested this using a few wiki sites.  You can also demo the service from the LANL servers.

Here is a link to a presentation that Herbert and Michael Nelson (co-collaborator on this project) at Old Dominion University gave at the Library of Congress on this project.  There was also a story about this project  A detailed paper that describes the Memento solution is also available on the arXive site.  There is also an article on Memento in the New Scientist.  Finally, tomorrow (November 19, 2009 at 8:00 AM EST), there will be a presentation on this at OCLC as part of their Distinguished Seminar Series, which will be available online for free (RSVP required).

This is a very interesting project that addresses one of the key problems with archiving web page content, which frequently changes.  I am looking forward to the team’s future work and hoping that the project gets some broader adoption.