AI Chatbots might be helping to write more than 1 in 10 biomedical research papers

Publicly released:
International
Photo by Solen Feyissa on Unsplash
Photo by Solen Feyissa on Unsplash

ChatGPT and other AI-based Large Language Models (LLMs) may be helping to write more than 1 in 10 biomedical research papers, according to international research. The research team looked at the language used in the abstracts of biomedical research papers from 2010 to 2024 and found that after LLMs emerged, the frequency of certain words, such as “delves,” “showcasing,” and “underscores”, increased. From this, the team estimated that 13.5% of abstracts published in 2024 could have involved LLM processing. They also found that the impact of LLMs on changes to scientific writing was greater than the impact of the pandemic.

Media release

From: AAAS

Appearance of LLMs marked a huge shift in the vocabulary of biomedical papers

Science Advances

The appearance of large language models (LLMs) caused a drastic shift in the vocabulary of academic writing, according to an analysis of more than 15 million biomedical abstracts published from 2010 to 2024. Results suggest that as much as 13.5% of abstracts in 2024 included excess words known to be favored by LLMs. There’s been much speculation about whether and how the arrival of LLMs influenced scientific writing. To investigate this question, Dmitry Kobak and colleagues modified an existing public health approach to analyze LLMs’ influence on biomedical abstracts published over a recent span of 14 years. During the COVID-19 pandemic, studies employed a framework that compared excess deaths during the pandemic with pre-COVID fatalities to deduce SAR-CoV-2’s impact on mortality. Kobak et al. applied this same approach to develop an excess word framework and parsed 15.1 million abstracts uploaded to the scientific literature archive PubMed. The analyses revealed that LLMs’ emergence sparked an increase in the frequency of usage for certain stylistic words, including “delves,” “showcasing,” “underscores,” “potential,” “findings,” and “critical.” From this, the team estimated that 13.5% of abstracts published in 2024 could have involved LLM processing. They also examined the impact of the pandemic versus LLMs on changes to scientific writing, concluding that the impact of LLMs surpasses that of COVID. Excess vocabulary during the pandemic consisted of relevant content words such as “respiratory,” but excess vocabulary in 2024 consisted of mainly style words. Basically, 79.2% of excess vocabulary before 2024 were nouns, while 66% were verbs and 14% were adjectives in 2024. The team also identified notable differences in LLM usage between research fields, countries, and venues.

Attachments

Note: Not all attachments are visible to the general public. Research URLs will go live after the embargo ends.

Research AAAS, Web page Please link to the article in online versions of your report
Journal/
conference:
Science Advances
Research:Paper
Organisation/s: University of Tübingen, Germany
Funder: R.G.-M. was funded by the Deutsche Forschungsgemeinschaft (KO6282/2-1). D.K., R.G.-M., and J.L. are supported by the Gemeinnützige Hertie-Stiftung. We thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting R.G.-M. D.K. is a member of the Germany’s Excellence cluster 2064 “Machine Learning—New Perspectives for Science” (EXC 390727645). E.-Á. H.’s work is supported by NSF CAREER grant no. IIS-1943506. We acknowledge support from the Open Access Publication Fund of the University of Tübingen
Media Contact/s
Contact details are only visible to registered journalists.