Notes on NISO Plus 2020

This week, I participated in the inaugural NISO Plus Conference. The conference took place in Baltimore, Maryland, from February 23 to 25. NISO stands for National Information Standards Organization. NISO is a key standards-setter for libraries, publishers, and other information providers, which maintains standards for MARC (Machine-Readable Catalog) records, the Dublin Core metadata standard, and JATS (Journal Article Tag Suite), among others. The great thing about this event was that it brought together leaders from every sector of the information value chain to hold pragmatic conversations about the future of digital publishing and information systems.

Opening Remarks

Todd Carpenter, executive director of NISO, welcomed attendees to the conference on Sunday afternoon. He noted its origins in the merger between the National Federation of Advanced Information Systems (NFAIS) and NISO, which took place in February 2019. He expressed his hope that the meeting would represent the best of both organizations, combining vision with attention to detail. He also emphasized that NISO Plus was as much about conversation and community as technology and standards. The goal of the conference was to promote frank discussion of challenges and problems rather than the marketing of solutions.

Amy Brand, director of the MIT Press, delivered the opening keynote. She spoke about the need to diversify scholarly communications landscapes and to avoid creating monocultures in digital infrastructure. Her opening point was that fostering “openness” is not an end in itself. She called attention to the unintended consequences of Tim Berners-Lee’s design for the World Wide Web, which he intended to foster open exchange of information and linked data, but which has instead produced commercial monopolies. If we have learned nothing else from the information wars, she remarked, we have come to appreciate the inextricable links between the content of information and information delivery systems. How can we promoted decentralized and diverse information systems rather than centralized infrastructure?

Brand pointed to a series of organizations and technologies that are working toward openness and diversity in scholarly communications infrastructure, including the publishing platform, Authorea, the Open Science Foundation, COAR: Confederation of Open Access Repositories, Invest in Open Infrastructure, the Networking and Information Technology and Research Development Program at NSF, and the Research on Research Institute, among others. She also detailed developments at the MIT Press to promote sustainable open access publishing. The MIT Press and the MIT Media Lab have formed a partnership called The Knowledge Futures Group to advocate for open infrastructure. Brand also pointed to the development of PubPub as well as the Underlay as open platforms. Brand then announced that the MIT Press had developed a new unit titled MIT Open Publishing Services to offer unbundled editorial and publishing services like peer review and copyediting. The NISO Manuscript Exchange Common Approach (MECA) will help to make manuscripts portable across systems and allow for the easy of use of such unbundled services. She likewise described the success of the CRediT Taxonomy as surfacing the work of contributors in more specific and granular ways.

At the conclusion of her talk, Brand returned to the theme that “open is not enough.” She reminded the audience that three critical platforms for shareable scholarship are celebrating their twentieth anniversaries: 1. CrossRef; 2. Creative Commons; and 3. Wikipedia. But what changes of direction must they take to ensure leadership of the open movement during the next twenty years? Brand expressed her hope, for instance, that Creative Commons might put out a new form of license that would provide presses with the opportunity to publish in the open without having to compete against downstream competitors.1

What I appreciated about Brand’s talk in particular was her self-described “pragmatism” about emerging scholarly communications business models; she balanced market and nonprofit motives as well as open and closed solutions. She expressed her worry that disciplinary societies, while offering open access to full text, might get locked into vertical solutions from specialized vendors, which could in turn affect the evaluation and production of new knowledge. If universities do not invest in scholarly infrastructure in a coordinated fashion aligning the interests of provosts, librarians, and CIOs, they will not be in a position to develop local alternatives and their faculty might get locked into pricy and restrictive publishing verticals.

Standards: JATS, JATS4R, STS, and SSOS

The next session provided the kind of presentations you would expect from a NISO conference, namely, overviews and updates about four NISO standards.

Deborah Aleyne Lapeyre of Mulberry Technologies spoke about the Journal Article Tag Suite (JATS), which supplies an XML-based standard for marking up journal articles to make them sharable between publishers as well as archivable in institutional repositories and other preservation platforms. She stressed that JATS is 1. ubiquitous (since nearly every publisher uses it); 2 standardized (JATS is a NISO standard); 3. enables discovery (and context-based search); and 4. foundational (the “plumbing” for open science used in systems like PubMed Central).

Melissa Harrison, head of production operations at eLife, delivered the second presentation on JATS4R, a small NISO working group that promotes recommendations for using the JATS standard effectively. The working group enjoys strong support among publishers, though recruiting leadership for these subgroups has proved challenging because everyone is so busy. Still, the working group has produced fifteen recommendations to date, covering topics ranging from implementing credit taxonomies to encoding bibliographic citations.

Robin Dunford, senior solution architect at Inera, described the NISO Standard Tag Suite (NISO STS). The goal of NISO STS is to supply a standard for publishing standards in a standardized format. Beyond NISO itself, other standards development organizations (SDOs) in Europe and the United States are adopting NISO STS to encode their documents. Robert Wheeler, director of publishing technologies at the American Society of Mechanical Engineers (ASME), then gave an introduction to the Standards-Specific Ontology Standard (NISO SSOS). While both talks focused on information standards in industry and governments, they also demonstrated the interest in metadata and ontologies beyond librarianship.

Artificial Intelligence and Machine Learning

I began the second day of NISO Plus by attending a session on artificial intelligence and machine learning. Jason Chabak of Yewno, a Silicon Valley startup that markets AI-enabled discovery tools, described the distinct forms of artificial intelligence, ranging from weak to strong AI. As he explained the potential of weak AI for automating labor, Chabak draw a nervous laugh when he remarked, “It’s not going to take all of our jobs.” At present, we should be looking for repetitive and mundane tasks in our workplaces that we could automate. Brian Cody, a co-founder of the open access publishing platform Scholastica, followed up with a talk about how artificial intelligence is already improving the quality of published articles with plagiarism detection, recommendation of peer reviewers, and identification of supplemental references. But Cody also cautioned that we need to be attentive to errors and misclassifications since artificial intelligence also exhibits biases and makes errors. Huajin Wang, a liaison librarian at Carnegie Mellon University, rounded out the conversation by highlighting the significance of data for machine learning. If you cannot find, clean, document, and combine datasets, you will not be able to create effective AIs. She held that libraries need more data curators to continue to develop data standards for different disciplinary communities and to make researchers’ datasets more shareable.

Open Access Mandates

Keith Webster, dean of university libraries at Carnegie Mellon University, introduced a panel about open access, observing that the packed room signaled strong community interest in the topic. Lauren Kane of Delta Think, a publishing consulting company, observed that changes to the current publishing ecosystem will inevitably benefit major publishers and research universities, but may negatively impact smaller publishers and international universities. Brian Cody of Scholastica provided clarification about Plan S, which combines mandates and technical requirements for distributing open content. He compared these requirements to GDPR, which feels overwhelming to smaller publishers. Can small publishers who, for instance, publish one or two journals in the humanities ignore evolving Plan S mandates as they do not generally work on grant-funded projects? At the moment, no checklist of requirements exists as Plan S is not finalized. Still, major changes are on the horizon and everyone (libraries included) need to pay close attention as they will affect every field. Webster observed that as funding moves from readership to authorship, research universities will pay more for publishing than teaching institutions. He cited a twelve fold increase in costs at Carnegie Mellon for ACM publications since moving to an open model because its faculty has the highest output worldwide of publications in computer science. Commercial entities like Google, IBM and Microsoft are top producers and consumers in computer science; should universities with high research outputs effectively subsidize their readership? During the conversation, a worry surfaced that faculty do not have much information about these impending changes. Will their scholarly societies guide them in the transition to these new open access publication models? Kane called on audience members to become better educators, not about the technologies but about the mission and strategy motivating these shifts.

AR/VR/3D - Non-Traditional Content Forms

How should libraries support nontraditional forms of scholarly production? During an afternoon session, panelists discussed both why and how librarians should provide services like 3D printing, photogrammetry, and virtual reality. Carl Grant, interim university librarian at the University of Oklahoma, argued that the library serves as a “Switzerland” for digital experimentation when tools, training, and events might otherwise fall under college or departmental auspices. Chad Mairn, manager of the Innovation Lab at St. Petersburg College, showcased educational applications of augmented and virtual reality. Of note was his use of a so-called Merge Cube to allow students to interact with virtual objects in physical space. Mairn also related his experience with Spatial, an augmented reality tool that allows remote students “to share” classrooms with students on campus. The repeated refrain from both speakers was, “It’s happening today!”

Digital Preservation

At midday, I attended a conversation on digital preservation led by Stephanie Orphan of Portico and Craig Van Dyck of the CLOCKSS Archive. Deanna Marcum kicked off the discussion by asking about the challenge of coordinating a digital preservation program as forms of content continue to diversify. Van Dyck noted in passing that while publishers fund digital preservation for business reasons, library funding has not kept pace. Another question concerned providing access to obsolete file formats. Portico has committed itself to migrating file formats over time. At present, Portico generates JATS for digital objects during ingestion. A different issue is that journals may not submit supplemental materials (like datasets) along with articles for preservation, meaning that essential components of the research process are at risk of loss. Van Dyke worried about the so-called “long tail” of journal publication, that is, smaller publishers who publish one or two journals and who do not follow industry standards and norms for digital preservation. Linked data forms another pain point as linked data is, by nature, distributed but digital preservation aims to maintain the integrity of digital objects. A similar problem arises with links to copyrighted content like YouTube videos.

Miles Conrad Award

The final session of Monday afternoon was the awarding of the Miles Conrad Award to James G. Neal, university librarian emeritus at Columbia University. The upshot of Neal’s address was that libraries, publishers, and other information provides must align themselves more intimately, moving from collaboration to “parabiosis,” that is, to start sharing information systems in organic ways that reduce redundancy and overlap among companies and institutions.

Economics of Information

On Tuesday morning, I attended a session on the economics of information. Keith Webster of Carnegie Mellon surveyed the current landscape, noting that libraries are pushing back against the “big deal” and regular pricing increases for journal subscriptions. Given the public anger at the steady growth of tuition fees, can libraries justify paying more every year for journal subscriptions? Libraries have sustained journal publishing by cutting back on costs elsewhere, decimating the sales of universities presses. But we should not neglect the advantages of the “big deal,” which leveled the playing field among research universities and allowed researchers to bypass library platforms by using Google Scholar for discovery and clicking directly through to articles PDFs. Scientists who once spent hours in the stacks now proudly brag that they have not set foot in libraries for years. He argued that, as librarians, we want 1. to reduce content (get rid of redundant and unused journals); 2. to reduce costs; and 3. to advocate for open access. Webster contended that librarians need to understand better the value that publishers add by stabilizing the scholarly record, but admitted that “nobody pretends libraries are getting a good deal out of this.” Still cancelling the “big deal” forces researchers into ethically problematic situations. When librarians cancel journal packages, researchers turn to “copyright fluid” sites like ResearchGate and SciHub for access to articles. Another challenge is that European research institutions are pursuing gold open access solutions to the journal crisis whereas the United States has advocated for green solutions. “What does success look like?” Webster asked. Will the future of publishing be fully open access or will it be a hybrid model? As we contemplate where we are heading, we should also consider the changing nature of the scholarly record, which now include datasets and software code; Webster recommended reading, for instance, OCLC’s white paper, The Evolving Scholarly Record, to get a sense of the shape of digital publishing to come.

Preservation and Archiving of Digital Media

After lunch, I participated in a panel on the preservation of digital media chaired by Wendy Queen, director of Project Muse. My co-panelist was Leslie Johnston, director of digital preservation at the U.S. National Archives. Johnston provided an overview of the heterogenous content that her team manages at NARA, which numbers in the billions of objects. Given the size of these datasets, NARA has moved to the AWS Government cloud but still conducts its own fixity tests. Archivists at NARA also interact with data on the cloud rather than download content locally for analysis. For my part, I provided an overview of the preservation challenges at the Vanderbilt Television News Archive from a financial, technical, and legal perspective; my paper on the subject is forthcoming in Information Services and Use.

Closing Remarks

danah boyd, principal researcher at Microsoft Research, was the closing speaker of the conference. In her talk, boyd described the problems of data-driven systems, which are rife with error and systemic biases. We may hope that humans can intervene to identify and fix such biases, but inserting humans as backstops in data-driven processes is not easy. On the one hand, people may share the biases of the AI systems they manage. On the other, they may not have kept up the skills to react efficiently when such systems break down. Worse, bad actors can also manipulate the data in ways that distort the information these systems produce. Michael Golebiewski and boyd have described the phenomenon of data voids, where agents of disinformation take the opportunity to exploit the absence of authoritative results on search engines or social media by filling the gaps with conspiracies and other forms of misinformation. Another common vector of attack is via comments on posts and social media, which lead readers to sources of disinformation. She warned about coordinated efforts to undo knowledge, pointing to Robert N. Proctor and Londa Schiebinger’s concept of agnotology, that is, the study of ignorance, as key reading to understand such attacks. How do we address the growing biases, disinformation, and breakdown of trust affecting our information ecosystem? boyd contended that our response needs to be “social-technical” because the problem has become systemic. Any intervention needs to extend beyond fact-checking and information literacy instruction to structural solutions.

In my estimation, the inaugural NISO Plus conference was a major success. Jason Griffey and his colleagues deserve much credit for bringing together such a diverse intellectual community. As librarians, we need to talk with our colleagues at publishing companies and other commercial information providers in relaxed settings to understand our common challenges and to chart new ways forward together. On a personal level, I came away with a better understanding of the centrality of JATS to contemporary publishing workflows, from distribution to aggregation to preservation. I hope that more research libraries will adopt JATS in open access publication platforms like institutional repositories to foster easier sharing and preservation of digital documents.