Introduction to NISO webinar on ebook preservation

Below are my welcoming remarks to the NISO webinar on Heritage Lost?: Ensuring the Preservation of Ebooks on May 23rd.

“Good afternoon and welcome to the second part of this NISO Two-Part Webinar on Understanding Critical Elements of E-books: Acquiring, Sharing, and Preserving.  This part is entitled Heritage Lost? Ensuring the Preservation of E-books.

Perhaps it is due to the fact that electronic journals were adopted much earlier and more rapidly, that we are more familiar with the archiving and preservation of e-journal content than e-book content. However, just as it did in the late 1990s after e-journals became prevalent, so too the topic of preservation of e-books is now rising up in the minds of people deeply concerned with the long-term preservation of cultural materials.

That is not to say that no one is considering these issues.  Some of the bigger digitization projects involve libraries and as such include preservation as part of their mission.  I’m thinking in particular about the Internet Archive, Portico and the HaithiTrust in this regard, but there are certainly others.  Today we’ll here from two of these groups and what they are doing to support

Another big preservation issue that is frequently overlooked is the model of distribution that many publishers are moving toward, which is a license model rather than a sale model.  I won’t get into either the legal or business rationale for this shift, but I do want to focus on this shift’s implications for preservation and in particular publishers.  An important analogy that I make to publishers is that of renting a house versus selling a house.  When a publisher sells a house (in this case a book), it passes on all the responsibility for the house and it’s upkeep onto the new owner.  Now if a person rents that same house, the responsibility for fixing the leaking roof, for painting the walls and repairing the broken windows generally falls back to the landlord who is renting the house.  Obviously, there is money to be made and the terms of the lease impact who is responsible for what, but in general, the owner is still the primary person responsible for the major upkeep of the house.

In the case of the sale of a book, the publisher is no longer responsible for that item and its preservation onto the new owner, say the library.  It is then up to the library to ensure that the book doesn’t fall apart, that the cover stays clean, or the pages don’t rip.  However, as we move to a license environment, the long-term responsibility of upgrading file formats, of continuing to provide access and functionality falls back to the publisher.  The publisher is the landlord, renting e-books to the publishing community.  And this responsibility requires a great deal more effort than simply hosting the file.  The publishers will eventually need to repaint, to refurbish, to fix the broken plumbing to speak on this digital collection.  I expect that this will be no small feat, and something that few publishers are prepared to address.

The Library of Congress has begun thinking about this problem from the perspective of their demand deposit requirement related to copyright registration for LC’s own collection.  While they are at the moment focused on electronic-only journals, one can envision a scenario where electronic-only books are not that far away.  LC has not explicitly discussed e-book preservation and their current work is only focused on e-journals.  However, the problems that LC is facing is illustrative of the larger issues that they likely will face.  There are standards for journal article formatting using XML, such as the soon to be released Journal Article Tag Suite or (JATS), formerly the NLM DTD.  This project developed by the National Library of Medicine in the US was specifically focused on developing an archival tagging model for journal article content distribution and preservation.  There is no similar model for books that is widely adopted.  If the variation of journal markup is significant, the same complexity for book content is some exponential increase over that.

No archive can sustain a stream of ingest from hundreds or thousands of publishers without standards.  It is simply unmanageable to accept any file in any format from thousands of publishers.    And this is of course, where standards comes in, although this isn’t the forefront of either of our presentations today, it does sit there in the not so distant background.

And there has been a great deal of focus over the past year on the adoption of the new EPUB 3.0 specification. This is a great advancement and it will certainly help speed adoption of e-books and their overall interoperability with existing systems.  However, it should be clear that EPUB is not designed as an archival format.  Many of the things that would make EPUB 3 archival exist within the structure but their inclusion by publishers is optional, not mandatory.  In the same way that accessibility and archiving functionality is possible within PDF files, but it is functionality that most publishers don’t take advantage of or implement.  We as a community, need to develop profiles of EPUB for preservation that publishes can target, if not for their distribution, at least for their long-term preservation purposes both internally and externally.

It will be a long-term project that we will be engaged in.  And it is something that we need to focus concerted attention on, because preservation isn’t the first thing on content creator’s minds.  However, we should be able to continue to press the issue and make progress on these issues.