AI & Machine Learning in Scholarly Publishing: Services, Data, and Ethics
NISO Plus Forum, October 3, 2023, Washington DC
Delivered as a “Lunchtime provocation” as a response to the prompt: What is the largest potential disruption regarding AI/ML from your perspective?
“What’s Your Second Question?”
Andrew K. Pace, Executive Director, USMAI Library Consortium
In one word, I think the biggest potential disruptor to Artificial Intelligence and Machine Learning is humans. People are not only the biggest disruptors in the field, we might also be its only hope.
As scholars, academics, purveyors of intellectual freedom, and in some cases, the last defenders of democracy, we have a special responsibility. Experts in Machine Learning will create new efficiencies. With more and more training, Artificial Intelligence (which today is merely Machine Learning with a marketing budget) will eventually mature. Its very generative nature will generate more intelligence itself.
But more importantly, our stock in trade—truth, ethics, academic integrity—will be critical.
Back in 2018-2019, before the disruption of the pandemic and before the hype of AI, and before my new role in library consortia, I commissioned a report for OCLC Research. Like today’s look at AI in scholarly publishing, we wanted a deeper dive on machine learning in libraries. Thomas Padilla, now with the Internet Archive, conducted dozens of interviews and a year’s worth of research to produce this report. But we argued over its title. Even 5 years ago, he argued that the conversations kept coming back to ethics, to the “human in the loop,” to a particular responsibility that would not be the focus of large for-profit corporations and consumer driven operations. Even though someone once told me that I would rather win an argument than come to the right conclusion, this time the latter prevailed. I was convinced that “Responsible Operations” should be the main title of the report, not its subhead. Thus, Responsible Operations: Data Science, Machine Learning, and AI in Libraries, by Thomas Padilla was published by OCLC Research in 2019.
Rather than a list of answers, the report is an agenda of the things that need our attention. Like the so-called Digital Library in the 1990s, higher education work in data science, machine learning, and AI has very little agency. We don’t have a set curriculum, standard training, a starting bibliography, or an organization focused on any of the above. NISO is one of the few organizations trying to define and align common goals or convene competing interests. Only a few places are training professionals with the kind of specialty in library and information science that will distinguish us in the field.
So yes, humans are the disruptors. And one of the most disruptive things we can do is ask questions. Lots of questions. But I’d like to suggest that we give these questions extra thought. Here’s an example of a bad question about AI: How can we stop it? The scholarly community tried this one over twenty years ago when Wikipedia was created. We spent the next 10 years trying to stop it, fight it, disparage it. Only recently has a cautious embrace begun. In this decade, I’ve seen syllabi that tell students to stay away from Wikipedia, even to gain encyclopedic knowledge on a topic. Going back an additional decade, some of us might remember those who decried the Web itself. The clinical professor in my library school program in 1994 declared the Web a fad and encouraged anyone who wanted to be a systems librarian to learn how to build PCs and servers and design hardwired networks. But who was I to judge as I trained to be a rare book cataloger. Thank God the web sent me in another direction. Thank God for fads. Librarianship in the 1990s—good times.
So let’s skip the “how do we stop it?” question. Instead, this is what I would encourage. Say the first question in your head and then ask yourself…what is my second question?
I was encouraged to think like this when I read about a professor who was asked what he would do with student papers submitted by generative AI. His response was something like “I don’t know, I was already thinking about how I could use AI to grade them.” He asked the second question. Let’s think of some first question / second question examples that are applicable to us.
First question. What happens if someone submits an article for publication that was created by AI?
Second question: How can I use AI to do peer review?
First question: How do I punish someone for using a fake article with a fake citation?
Second question: How do I cite this generative AI article and AI author?
First question: What if someone uses my data as training data for their machine learning?
Second question: What kind of data do I have that might eliminate bias, inequity, and unfairness in the machine?
[this one is going to sound a little harsher]
First question: Will AI replace me?
Second question: What and who can I replace with AI?
First question: How can I distinguish between a AI systems and humans?
Second question: How can we ensure that AI systems are aligned with human values?
First question: How can I get me some AI?
Second question: How can we ensure that the benefits of AI are shared equitably?
And finally, referring to my opening point...
First question: how can I operationalize AI in my organization?
Second question: How can I responsibly operationalize AI in my organization?
Finally, I would suggest that we have some time to take a beat between our first and second questions. I’m a big believer in the Gartner Hype cycle. I’ve never witnessed a technology that didn’t succumb to the cycle. Speed and longevity through the cycle might vary, but every technology follows it. If you’re not familiar with the cycle, there’s usually a technology trigger, then picture a steep climb, a long descent, then a slow rise to a plateau. Right now, we are on the climb to what Gartner calls the peak of inflated expectations (think Second Life or Blockchain). This is inevitably followed by some amount of time in the trough of disillusionment (think Linked Data, or Bluray DVDs), then if the technology is fortunate, it begins an ascent up the slope of enlightenment toward a plateau of productivity.
I would argue that since the advent of the Web, technology in publishing and libraries has always been a second mouse gets the cheese proposition. We’re content, if not financially positioned, to let larger, less risk averse organizations take a first grab at that big cheese. It’s usually better to be the second mouse in this scenario. We’ve occasionally been on the cutting edge, but hardly ever on the bleeding one. It is more in our nature to consider the consequence of change, to grapple with standards and best practices, to measure twice (or twenty) times before we cut. I joke that librarians love to tell you what a new technology will not be good for. But in the case of machine learning and AI, I think we should make our first principles our position of strength, not weakness. Let’s embrace trust, ethics, standards, best practices, diversity and inclusion, accessibility, privacy, and academic integrity, and create some agency in our profession for responsible operations. Let’s keep our wits about us while Wall Street plunders, the government flounders, and the chicken littles scurry. And let’s ask lots of second questions.