Lunchtime Provocation: Emily Singley

October 2023

AI & Machine Learning in Scholarly Publishing: Services, Data, and Ethics

NISO Plus Forum, October 3, 2023, Washington DC

Delivered as a “Lunchtime provocation” as a response to the prompt: What is the largest potential disruption regarding AI/ML from your perspective?

What is the largest potential disruption regarding AI/ML from your perspective? Are there corresponding opportunities, and if so what are they?

AI has the potential to reinvent how we communicate about science - how we create the scientific literature, how we interrogate and interact with it, and how well we all, as a broader society, understand the scientific literature

AI models have the potential to make science communication not only vastly more efficient, but also much more inclusive and universal.

This reinvention is already well underway

How will searching change? What if instead of “searching” or “discovering” information in the literature, we could interrogate it, dialogue with it?

Think about how we’ve traditionally searched the literature – typing in a few keywords into a scholarly database, getting back thousands of articles
Painstakingly sift through those to find what we are looking for
That’s a big waste of time
AI promises to change that
AI has promise to eliminate the “Needle in a haystack” problem - when you need to find something really specific that’s buried in the text of a paper
Example:
- My friend Jenny is a neuroscientist who studies decision making in monkeys using MRI protocols
- One time, her protocol wasn’t working the way she expected, and she couldn’t figure out why
- She needed to find other people who were doing similar experiments, using similar protocols
- Now, that is a really difficult thing to search for - papers aren’t organized or tagged by protocol, and that information is often buried in a passage of text
- Because Jenny didn’t really care if the research topic was the same - it didn’t need to be decision making, or even monkeys, it was the methodology she was interested in.
- Got really frustrated, spent a lot of time - time she could have spent on experiments and advancing scientific discovery
In the not too distant future, every single search box is going to be AI-assisted, able to find these kinds of answers, instead of spit back lots of papers
And we will see more and more LLMs pointed at trusted, reliable data sources, and you will be able to track back those answers to the peer-reviewed citations
Saves researchers like Jenny time - more time to experiment, less time sifting through paper

How is the creation of scientific literature changing? How will writing change?

Gen AI is already assisting in writing scientific papers
Publishers - including Elsevier - already allow submission of AI-assisted papers – so long as the use of AI is documented and transparent
We will likely see standards and conventions emerge to indicate AI generated text
And publishers will continue to spend significant resources developing more sophisticated tools to vet the authenticity, quality, and validity of science communication - and AI will be key to helping us do that
Communicating science findings might, in the future, not even revolve around the static “paper” anymore
Why not let data talk directly to data - why not let Jenny’s MRI outputs and lab notebooks and datasets inform a subsequent researcher’s experiments directly, without the intermediary static step of the output of a “paper”?
This is the promise we see when AI models converge with Open Science

How will understanding the scientific literature be reinvented? How will reading change?

Summarization and synthesis of the literature is a clear use case for genAI
Will make it easier for interdisciplinary researchers to better grasp unfamiliar subject areas, and for non-scientists to understand complex topics
And unlike ChatGPT, the tools that are emerging summarize trusted, accurate databases of research and point out to real citations for peer reviewed literature so you can easily validate the underlying sources
Scopus AI is currently in beta testing with users - you can type in a topic and get back a summary based on the 90 million peer-reviewed article abstracts in Scopus

Includes accurate citations for the papers that summary is based on

We are also experimenting with generating policy briefs on scientific topics - an advance that could be very useful for scientific advisors who work with lawmakers and regulatory bodies

Translation

Another way genAI will reinvent how we read and understand the scientific literature is through translation services
There is now the potential for scientists to understand one another regardless of their native language
Imagine if an Egyptian scientist could communicate her findings in Arabic and the rest of the world could easily comprehend them?
Think about the barrier that removes for her
This has the potential to greatly increase our global research output, as well as make the scholarly communication ecosystem more equitable and inclusive
This will finally be like having a “fish in your ear”

So these are just a few ways I see search, reading, and writing beginning to radically transform

What impact will this have on the information professions? What are the opportunities?

Opportunity for librarians to become metadata heros

This is something that librarians have done for a long time - metadata creation is a core role for our profession - organizing and structuring data
Stuctured data has become a critical resource
It is what underpins these large models – and the better the structure, the more accurate and effective the AI can be
But too many large scientific datasets do not have consistent, reliable structured data
There is an urgent need for annotating, tagging, and structuring massive data sets so that AI models can be more accurate and can analyze and make sense of them better
Example:
- My friend Ajit is a researcher who is working on a new biomarker to be used with cancer therapies
- He needed to query a massive biomed repository to see if others were doing similar work
- The repository consisted of millions of large datasets – and was too large to be query-able by traditional means, he needed to build and utilize an AI agent to search and interrogate all the data
- But his agent kept spitting back garbage – why? Because the datasets weren’t consistently structured. This was very frustrating for him and wasted a lot of his time
- “I know the information I need is there, I just can’t get to it” - again, needle in a haystack
People like Ajit should be in the lab running experiments, not wrangling data
So we need more people – and I think they should be librarians – who have the data science skills to write tools that will automate structuring these large scientific datasets
MARC is not going to solve this problem
We need next-gen metadata librarians - we need librarians to become data scientists
We need to train librarians to meet the metadata needs of today – rather than of yesterday

Conclusion

GenAI will significantly disrupt how we read, write, and interact with the scientific literature
It has great promise for accelerating scientific discovery, through efficiencies at communicating findings, and minimizing time spent querying the literature
Scientists will spend more time thinking, questioning, and experimenting, and less time slogging through databases and writing up lit reviews
This is a good thing for humanity

Lunchtime Provocation: Emily Singley

AI & Machine Learning in Scholarly Publishing: Services, Data, and Ethics

NISO Plus Forum, October 3, 2023, Washington DC

Related Information

Now Arriving at Elsevier: Emily Singley