Media release
From: 
Appearance of LLMs marked a huge shift in the vocabulary of biomedical papers
Science Advances
The appearance of large language models (LLMs) caused a drastic shift in the vocabulary of academic writing, according to an analysis of more than 15 million biomedical abstracts published from 2010 to 2024. Results suggest that as much as 13.5% of abstracts in 2024 included excess words known to be favored by LLMs. There’s been much speculation about whether and how the arrival of LLMs influenced scientific writing. To investigate this question, Dmitry Kobak and colleagues modified an existing public health approach to analyze LLMs’ influence on biomedical abstracts published over a recent span of 14 years. During the COVID-19 pandemic, studies employed a framework that compared excess deaths during the pandemic with pre-COVID fatalities to deduce SAR-CoV-2’s impact on mortality. Kobak et al. applied this same approach to develop an excess word framework and parsed 15.1 million abstracts uploaded to the scientific literature archive PubMed. The analyses revealed that LLMs’ emergence sparked an increase in the frequency of usage for certain stylistic words, including “delves,” “showcasing,” “underscores,” “potential,” “findings,” and “critical.” From this, the team estimated that 13.5% of abstracts published in 2024 could have involved LLM processing. They also examined the impact of the pandemic versus LLMs on changes to scientific writing, concluding that the impact of LLMs surpasses that of COVID. Excess vocabulary during the pandemic consisted of relevant content words such as “respiratory,” but excess vocabulary in 2024 consisted of mainly style words. Basically, 79.2% of excess vocabulary before 2024 were nouns, while 66% were verbs and 14% were adjectives in 2024. The team also identified notable differences in LLM usage between research fields, countries, and venues.