Discoverable, Available, Accessible: Preserving Digital Content
Below are listed questions that were submitted during the September 14, 2011 webinar. Answers from the presenters will be added when available. Not all the questions could be responded to during the live webinar, so those that could not be addressed at the time are also included below.
- Standards in the World of Preservation
Amy Kirchhoff, Archive Service Product Manager, Portico
- Rosetta, a Digital Preservation System
Ido Peled, Rosetta Product Manager, Ex Libris
- CRL Assessment and Evaluation of Digital Repositories
Marie-Elise Waltz, Special Projects Librarian, Center for Research Libraries
Feel free to contact us if you have any additional questions about library, publishing, and technical services standards, standards development, or if you have suggestions for new standards, recommended practices, or areas where NISO should be engaged.
NISO Webinar Questions and Answers
1. How important is preservation of an object in its original format, even when the original is problematic - proprietary format, outmoded version, etc?
Amy Kirchhoff: Often it is quite important to preserve the original version of the file alongside its current instantiation. Preserving both gives an preservation agency the most choices in the future.
Ido Peled: Learning from our customers, we see that preserving the original objects in their original format is very important. Keeping the original object ensures that preservation actions can be rolled back and it also provides institutions maximum flexibility for the future when other (better) alternatives may show up. In addition, institutions need to think of the cost of storage for all the revisions on an object. Solution must provide means for defining revision storage policies, such as: keep the original and latest revision, keep the last X revisions, etc.
2. In terms of digital preservation - how important is it to normalize the acquired file formats to open access formats on ingest (instead of waiting to migrate later on, e.g. 5 yrs down the road)?
Amy Kirchhoff: Each organization must perform its own risk assessment and analysis based upon the policies it has set and contractual services it has offered. It is difficult to come up with a rule of thumb. Some of the factors to consider in the analysis are:
- The commitments the organization has made to its constituents.
- The “messiness” of the content under consideration ad whether or not the organization would need to reach out to the content creator for explanations at the point of migration.
- The make-up of the archive (e.g., if the vast majority of the archive is TIFF images, but you occasionally receive a Word document, it may simply not be worth it to transform those occasional Word documents into something else)
- Skill-set of the archive staff
- Budget and cost
- Previously established policies
At Portico we have normalized the XML files for both journal articles and books because the files are so idiosyncratic and troublesome in their original formats – in their original formats, it is impossible to uniformly manage the archive. However, for d-collections we do not do an initial transformation, even though the content is also provided in proprietary XML formats. The d-collection content we preserve is very clean and crisp and we have confidence that we could migrate it in the future with ease, if we needed to do so.
Ido Peled: Deciding when to migrate content may depend on various factors: storage availability, institutional policy on best practices and used formats and more. From discussions with institutions we find that the smaller the number of formats used in the repository the easy it becomes to ensure their long-term preservation and access. Rosetta enables migration on ingest as well as migration in a later stage. Each migration action is a result of a preservation planning process that evaluates the most suitable alternatives according to business and technical criteria.
3. Are all repositories in LOCKSS audited in common?
Marie-Elise Waltz: An audit using the TRAC or TDR Checklist evaluates the administration and policies, object handling and technical infrastructure of a repository. LOCKSS provides an infrastructure (digital preservation tools and support) that enables multiple organizations to collaborate on a common preservation goal. The auditing process for an implementation of digital preservation that used LOCKSS would have to take into account, not just the LOCKSS infrastructure, but also the particular LOCKSS implementation project, which would include things like the number of organizations participating in that particular project, the level of support (from LOCKSS) for their participation, and so forth. It is likely that an audit of a LOCKSS implementation project would audit the implementation-as-a-whole. That would likely include examining appropriate TDR metrics for the system as-a-whole (e.g., deposit/retrieval of an item, logs demonstrating automated checking for damaged content and fixing of it, etc.). Depending on the size of the project and the number of partners and installations, it would also likely include checking appropriate metrics against installations at some number of randomly selected individual project-partners, but not checking every partner individually. A resulting certification would address the specific LOCKSS project as-a-whole, not individual partners. After a certification of a LOCKSS project, a partner would certainly be able to say that they are part of a certified project, but would not be able to say that their LOCKSS box was certified for other projects. This would be somewhat analogous to certifying a stand alone repository without examining every disk or issuing a certification for a particular set of discs and servers.