Why do we need a format registry for digital preservation?

If you diligently protected a WordStar document for the last twenty-five years, all of its original bits may still be intact, but it would not be usable to anyone. Today’s computers do not have software that can open documents in the WordStar format. It’s not enough to keep digital bits safe; to fully preserve digital content we must make sure that it remains compatible with modern technology. Given that the ultimate goal of digital preservation is to keep content usable, practically how do we accomplish this? Somehow we need to be able to answer two questions: (1) is the content I’m managing in danger of becoming unusable, and if so, (2) how can I remedy this situation?

Formats play a key role in determining if digital material is usable. While traditional books are human-readable, giving the reader immediate access to the intellectual content, to use a digital book, the reader needs hardware that runs software, that understands formats, composed of bits, to access the intellectual content. Without technological mediation, a digital book cannot be read. Formats are the bridge between the bits and the technologies needed to make sense of the bits. The formats of the bits are the key to knowing if there are technologies that can make the bits usable.

Returning to the question—Is the content I’m managing in danger of becoming unusable?—the question can be answered if we know the formats of the content we’re managing, and additional information about those formats. We need to know if there are current acceptable technologies that support the formats, sustainability issues related to the formats, and how others in the digital preservation community have assessed the formats. If we determine that the content is in danger of becoming unusable, we can form a remediation plan if we have additional information about the formats. We need to know alternative formats for the content, supporting transformation or emulation tools, and as a last resort, enough documentation about the format to construct our own tools to transform or render the content.

All institutions engaged in long-term digital preservation need this same format information. The concept of the format registry is simple—pool and share the data so that each institution does not have to collect and manage this information for itself, and does not need in-house expertise for all the formats it needs to manage. Additionally, because the format registry would provide authority control for format names and identifiers, it would enable institutions to more easily share file tools and services, and exchange content.

