We Can’t Let AI Generation Tools Take Away Our Own Training

Letter from the Executive Director, November 2023

No one starts off doing something well. We all learn. We all improve over time, at different paces and to the best of our abilities. To become really good at something takes time, effort, and dedication. As someone recently posted on social media, “Doing something for the first time is like playing the violin. Anyone can pull the bow across the strings and make noise, but if it’s your first time and you’re not horrible at it, something special is happening.”

Sadly, I see this at play in our obsession with the rapid adoption of generative artificial intelligence (AI) tools. We all hope that we can write a book, craft a thoughtful essay, or write code with the help of these tools, without really doing the groundwork to get good at these difficult tasks. We’re expecting greater and greater things from our kids as students, as graduate students, or as emerging writers. Everyone is judged by the highest standard of accomplishment, without understanding that no one emerged a Pulitzer P–winning author, a Nobel laureate, or an Oscar-winning screenwriter without going through the gauntlet of practice and honing of one’s craft. Generative AI tools have the potential to rob us of the opportunities for exploration and skill development that come with doing something poorly at first. We all grow and learn, we all make mistakes. This is okay. In fact, it’s what makes our explorations so rich and the resulting outputs so much better.

Last month, when the Writers Guild of America reached an agreement for its next contract, one aspect of the negotiations centered around generative AI tools and their application in the writing process. Obviously, issues of compensation and streaming royalties played a more important role in the agreement, but the fact this was being discussed at all signifies an interesting inflection point in our culture. While there were many mundane and predictable shows on TV before generative AI tools existed, and there will likely be many more hereafter, the ability of a machine to generate a reasonably workable script with hardly any effort or cost is concerning enough for the writer’s union to seek to block this use of technology from advancing. Much like Andrew Pace suggested at the NISO Plus Forum in his provocation, what are the second order questions we should consider in this context?  One of these, from my perspective, is: What will happen if the “easy work” is left to machines, and therefore no one learns how to do it, but it’s a prerequisite to understand how to improve the harder work? The impact on our community could be more profound and potentially more damaging than if bad movies proliferate.

If people don’t learn how to do the basics (be it playing the violin, screenwriting, or writing academic papers), the more difficult tasks will be harder to conceptualize and improve upon.  Learning from our mistakes and improving on them is core to our humanity. It’s important to understand that language models lack this capacity to know the direction of “better.” When AI tools are trained on a corpus of data that’s machine generated, the results become increasingly unreliable. This feedback loop of non-human content can lead to “model collapse,” in which the resulting generated content will be increasingly unreliable. In a blog post earlier this summer, Ross Anderson, professor of security engineering at Cambridge University and the University of Edinburgh, wrote, “Just as we’ve strewn the oceans with plastic trash and filled the atmosphere with carbon dioxide, so we’re about to fill the internet with blah.” That “blah” content is causing models to generate gibberish because the content upon which it’s being trained is probabilistic and not rooted in reality. While humans learn from their mistakes and assess validity compared with reality, the neural net models don’t have that connection to the real world and therefore extend ever further into a netherworld of possibilities.

When our community met at the NISO Forum in Washington, DC, last month, these ideas were also on the minds of the participants. During a thoroughly engaging day, we discussed how AI tools were impacting our community, and vice versa. With the ever-increasing availability of vetted open access content, it’s possible that AI models can be trained on a corpus of trusted scholarly research, rather than the “blah” being generated and distributed on the internet. 

A suggestion for potential work within NISO was to begin exploring how to vet the content against which a model is trained, so that the model can then be trusted. Many of the other ideas also focused on trust and transparency regarding the role that AI tools play in the generation of content. It was agreed that while we can’t possibly police the use of these tools, it’s certainly possible to establish norms of practice for assignment of credit and creating awareness when certain tools are used in the processing of science. Acknowledging the tools used to generate text, images, code, etc., in the research process would certainly help in establishing and maintaining the trust in the results.  

There were about a dozen other ideas that were generated at the NISO Plus Forum. Many of these ideas will form the backbone of a thread at the in-person NISO Plus conference that will take place in Baltimore in February 2024. Registration for the meeting just opened. We look forward to seeing you in person, when we can discuss this in more detail and see what we can do about it, collectively. In the meantime, you can read more about the Forum and its outputs.

Sincerely

Todd Carpenter
Executive Director, NISO