The Future Battles Over Metadata Records

Letter from the Executive Director, July 2022

Last month, I posted an article on the Scholarly Kitchen about the lawsuit filed by OCLC against Clarivate. The post was primarily historical in focus and provided an overview of some of the issues at play in the suit. The case is continuing to play out. On June 27, 2022, the judge in the case issued a restraining order and called for discovery in the case, in recognition of OCLC’s claims. Where this eventually ends and what the long-term implications are for the marketplace of bibliographic data will be significant.

Let us begin with what should be obvious to readers of NISO’s information organized: High-quality metadata is critical to a functioning ecosystem for discovery, delivery, and interoperability of both digital and physical content. Without quality metadata, libraries wouldn’t be able to purchase content; publishers couldn’t distribute content; users couldn’t discover the content they needed; and libraries couldn’t circulate the content they acquire. If any of these were possible without quality metadata prior to our machine-interoperable world, they certainly are near-impossible without it today.

The marketplace is becoming flooded with free-to-read content because of advances in the adoption of open access publication, including not just articles and reports, but also books and other free-to-consume content. Having tools to navigate this ecosystem, again, will depend on metadata. More importantly, tools that will be able to bring readers to the most relevant content to match their information needs will increasingly become more and more valuable. The fourth of Ranganathan’s Laws of Library Science states that librarians should “save the time of the reader.” Inherently, this work requires the highest-quality metadata and services that bring the user the content they require, while distinguishing valuable content from less valuable/relevant resources. If there is a solution to this problem, metadata will likely play a central role.

Fortunately or not, the world is awash in metadata as much as it is in content. Hundreds of institutions in our community have released a range of bibliographic data, from authority files to complete cataloging records. There are billions and billions of linked-data triples available, upon which one could build a catalog system. Metadata is a commodity, but varying quality is its distinguishing characteristic. Much like there are specific grades of petroleum (there are more than 150 varieties), metadata quality can range from very basic to very detailed, and from dubious to highly trusted. Whether the metadata is sufficient to get you a copy of Hamlet or whether it is sufficient to get you the script from the Third Folio and indicates how much that version differs from the First Folio will depend entirely on the detail in the record. The value of those differences depends entirely on one’s use case. If you simply need a copy of the text for your own casual reading, probably any version will do. A scholar examining the subtle differences in language as the editions have changed will require a much more detailed level of specificity in the record. Librarians have devoted careers over the decades to improving the quality and detail of the records that describe items in their collections. Aggregators of bibliographic data have done their part to add and enhance records, thereby adding to the quality of the overall collection of data about library items.

The legal status of a cataloging record is ambiguous at best, and a record is most likely not subject to copyright at all in the US. In US law, what is copyrightable and therefore entitled to legal protection is an “original” work. The value is centered primarily on the concept of creativity in the creation of new content, whereas content that lacks creativity is not subject to copyright. In the case of cataloging records, it would be difficult to assert that the collection of standard facts is a creative endeavor. The overriding case related to the issue of compilation of facts in US law is the 1991 Supreme Court case, Feist Publications, Inc. v. Rural Telephone Service Co., in which the court ruled that information in the rural telephone book was factual and therefore not copyrightable, although the expression and the organization of factual information can be. This ruling asserts that information (that is, facts, discoveries, etc.) from any source is reasonable to reproduce, so long as it does not contain "expressive" content from the author.  

In the case of catalog records, one might argue that the assignment of a cataloging number might be unique, and that element could be unique, but that the rest of the information in the catalog record is normally (though potentially not always) facts. Even the assignment of catalog numbers (especially if they are sequential or algorithmically derived), or other “enrichment” with additional facts described in standard formats, likely isn’t creativity in cataloging. While sui generis database rights in cataloging data may exist in the European Union, those rights do not exist in the United States. It will be interesting to see if OCLC, in its suit, tries to extend database rights by pursuing an infringement case in an attempt to overrule the Feist precedent.  Given the current SCOTUS willingness to jettison precedent, such an approach might be a positive strategy for OCLC (though notably not in the interests of the library community it ostensibly represents, given the strong and longstanding advocacy against such rights by library organizations).

Fundamentally, these battles over cataloging records are indeed about power—who controls it and who has the right to wield it. Collectively, libraries have given over their data to OCLC and entrust the organization to use it to its best ends. Whether OCLC is meeting those ideals or not is up to its members to determine. It is worth asking whether a community-controlled infrastructure can be monopolistic. If the majority of the community agree to common principles about how data is created, shared, and governed, and participants have a controlling say in the principles that govern its interactions, this could be viewed as the community acting in its own collective interest. This is a fundamental principle of community-owned infrastructure, as some describe it. However, this control is creating outliers, who might—for whatever reason—not want to participate in the collective or who find that the terms don’t suit their ends. There are still others in the community who are driven by principle to pursue open exchange of information regardless of the eventual outcome or effects. While these positions might be in the minority today, they do represent a growing segment of the population.  

Setting aside the question of whether it makes sense for a community-run organization to control and seek to capture the benefits of this monopoly control over this ecosystem, the question about the long-term viability of this model looms large. Realistically, the control that OCLC currently exerts over the marketplace for bibliographic records isn’t likely to be sustainable. There is a sufficiently large amount of freely available data, and the players in this space could seek an equivalent pool of resources with which they could develop competing assets, if they were so motivated. While the court has temporarily restrained the current development of MetaDoor, there is little reason to think other attempts at opening up this data will be effectively blocked in the long term. However, OCLC may well be able to justify its place of primacy in this market, owing to the quality of its data and the resources it invests in enriching the data it has. To draw an analogy, the free availability of market trading data has only had a marginal negative impact on the value of a Bloomberg terminal in financial markets; it still costs roughly $2,000 per month.  

Increasingly, libraries will develop uses for cataloging data that, for whatever reason, will conflict with the business model or services that OCLC provides. This might drive the library community into increased silos to address its needs, outside of the ecosystem that OCLC chooses to support, or other providers might create similar resources that can support those libraries’ needs. One might see services like shared collections (such as DPLA or BTTA’s Big Collection) as examples where competing interests in cataloging data might drive this diversification in the marketplace. Who can tell what other innovative uses of library data might be created or demanded based on collective metadata aggregation? Either the dominant players in this market will choose to support these potential new innovations, or the customers of those players will consider whether they can make do with a lesser-quality metadata record or create their own quality records, and thereafter splinter into different silos based on functionality or approach. Regardless, the landscape of metadata and cataloging infrastructure is changing, and there will be many battles along the way over how we achieve the goal of serving users with the best available content and services.



Todd Carpenter
Executive Director, NISO