Newsline March 2018

Letter From the Editor

Consistency of data is a serious problem in our community. For all the effort that the content creation and library communities place in data, it is amazing that the problems that remain are still so significant. For each time that NISO, EDItEUR, ISO, or BISG develops a standard for metadata creation or exchange, there seem to emerge several follow-on efforts to clarify and develop community practices around how to use that data exchange standard. Metadata specifications exist for books, e-books, resolution, preservation, and so many more things. There are also identifiers for people, identifiers for articles, for titles, and for institutions. Ideally, the design of each of these elements would simplify the data exchange ecosystem and smooth data transmission from producer to supplier and to user.

A problem, and there are several, has been in the application of those systems.  Each content creator or systems supplier has its own take on the application of a particular specification. Sometimes ambiguity is purposely built into a standard to support flexibility. Other times, it isn't absolutely clear what a data element means for every case; for example, there is no clean or clear definition of what a publication date is. Finally, we often overlook the fact that metadata changes as the state of the world changes. Data that was correct a few weeks or months ago, particularly pre-publication data, isn't necessarily accurate unless it is consistently maintained.

No human process is without errors and we must accept that some level of inaccuracy is inevitable. We certainly can't fault an organization if its data falls something short of 100% accuracy all of the time, especially when the data elements and fields combined can number into the millions or billions. However, even beyond an understandable margin for error, most providers could do better and we can do more to minimize errors. Some of the problems are introduced by people and or companies who don't understand, appreciate or, in the worst cases, even care about the errors they are introducing into system-wide data pipelines.  For example, publishers who assign the same ISBN number to multiple titles are often flagged and excoriated for their errors.  Other problems are a result of a lack of priority in maintaining these data.  Particularly for older content, support for retrospective projects to ensure its accuracy isn't always a top priority, so errors go uncorrected and propagate in the ecosystem.

There have been a variety of projects within NISO and outside NISO to improve the conformance with specifications or to focus on their consistent application.  Within NISO, projects such as KBARTIOTAPESCPIE-J and the ESPReSSO projects were more about consistency and best practice in data exchange than setting forward new structures or standards.  The current NISO E-Book Metadata Working Group is also addressing issues of consistency and communication. Outside NISO but related to our work, the JATS4R initiative and the nascent STS4R aim to improve consistency in the usage of NISO markup standards.  Within EDITeUR and BISG, several projects have focused on consistent metadatakeyword use, for keyword application, and other consistency issues. Metadata 2020 is "a collaboration that advocates richer, connected, and reusable, open metadata for all research outputs."

NISO will continue to work, not only to develop standards and specifications, but also to continue to develop communities of practice and consistency in their use.  The challenge isn't always with the adoption of a specification, it is with the consistency of that adoption and the reusability of the data that is being exchanged.  In the coming months, NISO will focus on, and draw attention to, the value that is brought by consistency of standards' use in applications.  If you have a story related to this - either a positive or negative story - please let us know!


Todd Carpenter

Executive Director, NISO