AI summaries of scientific research oversimplify findings

Publicly released:
International
0031639588070
0031639588070

Artificial intelligence chatbots driven by large language models (LLMs) tend to exaggerate the scope of research when summarising scientific papers, according to Dutch and UK researchers. They analysed 4,900 chatbot-generated summaries of scientific abstracts (which are themselves a short summary at the start of a scientific paper), and found they were five times more likely to overgeneralise findings than human experts. The researchers did not ask the chatbots to write the summaries for an expert audience, but they did request 'systematic, detailed and faithful abstract summaries' in their prompts. Ironically, prompting for accuracy increased overgeneralisations, and newer LLM models were less accurate than older ones.

Media release

From:

AI chatbots overgeneralise when asked to summarise scientific papers, creating “significant risk of large-scale misinterpretations of research findings”. Analysis of 4,900 chatbot-generated summaries of papers, spanning coffee’s health benefits to climate change beliefs, found some chatbots were nearly five times more likely to overgeneralise findings than human summaries. Ironically, prompting for accuracy increased this tendency. The researchers call for stronger safeguards and highlight potential mitigation strategies, like benchmarking large language models for accuracy.

Journal/
conference:
Royal Society Open Science
Research:Paper
Organisation/s: Utrecht University, Cambridge University
Funder: o funding has been received for this article.
Media Contact/s
Contact details are only visible to registered journalists.