AI summaries of scientific research oversimplify findings

Embargoed until: Publicly released: 2025-04-16 09:01

International

0031639588070

Artificial intelligence chatbots driven by large language models (LLMs) tend to exaggerate the scope of research when summarising scientific papers, according to Dutch and UK researchers. They analysed 4,900 chatbot-generated summaries of scientific abstracts (which are themselves a short summary at the start of a scientific paper), and found they were five times more likely to overgeneralise findings than human experts. The researchers did not ask the chatbots to write the summaries for an expert audience, but they did request 'systematic, detailed and faithful abstract summaries' in their prompts. Ironically, prompting for accuracy increased overgeneralisations, and newer LLM models were less accurate than older ones.

Media release

From:

AI chatbots overgeneralise when asked to summarise scientific papers, creating “significant risk of large-scale misinterpretations of research findings”. Analysis of 4,900 chatbot-generated summaries of papers, spanning coffee’s health benefits to climate change beliefs, found some chatbots were nearly five times more likely to overgeneralise findings than human summaries. Ironically, prompting for accuracy increased this tendency. The researchers call for stronger safeguards and highlight potential mitigation strategies, like benchmarking large language models for accuracy.

Journal/
conference: Royal Society Open Science

Research:Paper

Organisation/s: Utrecht University, Cambridge University

Funder: o funding has been received for this article.

Media Contact/s

Contact details are only visible to registered journalists.