AI chatbots probably won't replace doctors any time soon

Embargoed until: Publicly released: 2026-04-14 01:00

International

CC-0

Despite recent advances, artificial intelligence (AI) chatbots aren't very good at using reasoning to reach a medical diagnosis, so they should only be used with caution and under human supervision, according to US scientists. They tested 21 AI chatbots' diagnostic abilities in 29 different medical scenarios, and found that Grok4 performed best, while Gemini 1.5 Flash was worst. The chatbots struggled when they had to differentiate between possible diagnoses, but were better at providing a final diagnosis and suggesting treatment plans, the experts say. The findings suggest chatbots aren't yet smart enough to be deployed in diagnostic settings without human supervision, the authors conclude.

News release

From: JAMA

Large Language Model Performance and Clinical Reasoning Tasks

About The Study: The findings of this study suggest that despite progress, current large language models remain limited in early diagnostic reasoning and cannot yet be relied on for unsupervised patient-facing clinical decision-making.

Attachments

Note: Not all attachments are visible to the general public. Research URLs will go live after the embargo ends.

Research JAMA, Web page The URL will go live after the embargo ends

Journal/
conference: JAMA Network Open

Research:Paper

Organisation/s: Harvard Medical School, USA, Mass General Brigham, USA

Funder: Ms Rao is supported in part by award T32GM144273 from the National Institute of General Medical Sciences.

Media Contact/s

Contact details are only visible to registered journalists.