Fake medical datasets created by ChatGPT are pretty hard to spot

Embargoed until: Publicly released: 2025-04-25 01:00

International

CC-0. https://pixabay.com/photos/matrix-data-network-software-code-4493783/

Italian and German scientists created fake medical datasets using ChatGPT and then looked for characteristics that marked these datasets out as phonies. The team used ChatGPT-4o to produce 12 'unrefined' datasets, and a custom version of ChatGPT to create 12 'refined' datasets based on the 'unrefined' data. The unrefined datasets included 103 signs of fakery, including mismatches between patient names and gender, visits conducted at weekends, and age calculation errors. However, once these datasets were refined by the custom ChatGPT, there were far fewer of these tell-tale signs, with four refined datasets appearing completely authentic when analysed. The findings show how easy it is to use artificial intelligence to create sham medical datasets that appear completely authentic when analysed by researchers, the team concludes.

Attachments

Note: Not all attachments are visible to the general public. Research URLs will go live after the embargo ends.

Research JAMA, Web page The URL will go live after the embargo ends

Editorial / Opinion JAMA, Web page The URL will go live after the embargo ends

Journal/
conference: JAMA Ophthalmology

Research: Link to Paper 1 | Paper 2

Organisation/s: University of Cagliari, Italy, Bascom Palmer Eye Institute, USA

Funder: No information provided.

Media Contact/s

Contact details are only visible to registered journalists.