Digital Preservation in Capable Hands: Taking Control of Risk Assessment at the National Library of New Zealand

New Zealand’s digital documentary heritage is encoded according to a diverse array of file formats. Identification and characterization of the formats is a constant challenge. This challenge makes it difficult to establish an accurate risk view of the content to mitigate format obsoleteness.

The National Digital Heritage Archive (NDHA) of the National Library of New Zealand Te Puna Ma¯tauranga o Aotearoa has concluded that the measurement of conformance of files to a format standard for such risk analysis is at best insufficient and at worst harmful. For the digital documentary heritage of New Zealand, the ideal is the measurement of individual file profiles against application specifications. This gives a meaningful and actionable risk view of our content.

With no limitations or control over the format of the content that is collected and preserved, the Library has issues to resolve before the long-term preservation of digital collections can be assured. There are many significant obstacles that make the term “permanent access” an almost meaningless catchphrase when applied to such a collection of digital content made up of disparate file formats. Solving these and other problems is the responsibility of the National Digital Heritage Archive (NDHA) and a significant step has been taken through the development of the Rosetta preservation repository system in conjunction with Ex Libris Group.

While the life-span of content stored on physical materials such as paper, glass, wood, and stone can be accurately predicted based on hundreds of years of experience, backed by scientific research into material composition and the effects of environmental conditions like temperature and humidity, the best that the preservation community can do with digital material is to make educated guesses based on a few decades of mostly anecdotal experience. The concept of information encoded according to a file format has only been in existence since about the 1950s and therefore the field of digital preservation must be considered as being still in its infancy. Happily, significant advances have occurred in the area of data storage and management that permit cultural heritage institutions to manage enormous digital collections of permanently valuable material in online (or nearly online) repositories of spinning disks and/or robotic tape libraries. Through the use of checksums to detect format rot or corruption, virus scanning to protect against malicious code, robust network and physical security, and comprehensive disaster planning, it is not too far-fetched to believe that it is now possible to guarantee bitstream preservation— which is to say, preserving deposited files perfectly in their original form. We view this as “passive preservation” that is foundational to digital preservation. Unfortunately, while the perfect preservation of a human-readable format such as a paper manuscript is usually synonymous with access to its content, bit-preservation of electronic formats is not. The inevitable obsoleteness of the hardware and software components necessary to interpret and render files in a usable form makes it necessary to complement perfect but passive preservation with some form of active, managed preservation. (We are painfully aware that we do not discuss in more detail our use of the word “render.” It is a loaded term with many levels of interpretation. We are currently defining this internally as it is critical to our risk analysis. Space deters us from exploring it further in this paper.) This demands an accurate risk view of the repository. This risk view is the mechanism that offers enough warning to the NDHA in order that action can be taken to allow continued access to the material.

