These medical AIs for diagnosis and treatment decisions are at least as good as doctors

Publicly released:
International
CC-0
CC-0

In a pair of papers, scientists outline two medical artificial intelligence (AI) models they say are at least as effective as doctors. Firstly, German scientists describe MIRA (Medical Intelligence for Reasoning and Action), an AI with access to patient data in an isolated electronic health record system. The team evaluated MIRA using real-world data from more than 500 emergency department cases. MIRA can choose from over 85,000 options to order diagnostic tests, interpret the results, and make treatment plans, including prescribing medication, scheduling procedures and arranging admissions. Its diagnostic accuracy was 87.8%, compared to 78.1% from a panel of six doctors. In the second study, Google scientists introduce AMIE (Articulate Medical Intelligence Explorer), an AI based on the company's Gemini AI, designed for clinical management and conversations. In tests, AMIE performed as well as real doctors in management reasoning, and it beat them in preciseness of treatments and investigations, and in its alignment with clinical guidelines and grounding of management plans in those guidelines. While impressive, both AIs need further work before they could be introduced as part of clinical care, the authors conclude.

News release

From: Springer Nature

Medical AI models for patient management

Two independent AI models that can assist with multiple stages of patient management, from diagnosis to treatment decisions, are presented in Nature this week. The systems — MIRA (Medical Intelligence for Reasoning and Action) and Google's AMIE (Articulate Medical Intelligence Explorer) — perform at least as well as physicians, demonstrating the potential for conversational AI tools to help with disease management.

Large language models (LLMs) have shown promising developments for clinical applications, but they tend to specialize in narrowly defined tasks. The clinical management of patients requires a multifaceted approach, delving into patient histories, carrying out appropriate investigations, making accurate diagnoses, planning treatment options (both pharmaceutical and surgical), and monitoring outcomes over multiple visits. If AI agents could carry out such tasks, achieving effective management reasoning, they may be able to assist physicians in routine tasks and possibly address physician shortages in some regions of the world. Two papers in Nature report advances in the capabilities of autonomous medical AI agents.

Jakob Kather and colleagues describe MIRA, an AI model that has access to patient data in an isolated electronic health record system. The model is evaluated using real-world data from more than 500 emergency department clinical cases. MIRA gathers information via chat with a patient AI agent whose responses match documented histories taken from clinical notes. MIRA can choose from over 85,000 options to order diagnostic tests, interpret the results, and make treatment plans including prescribing medication, scheduling procedures and arranging admissions. It achieved an average diagnostic accuracy of 87.8%, compared to 78.1% from a panel of six physicians across specialities. Future work is needed to further improve accuracy and establish generalization in real-world studies, the authors conclude.

Mike Schaekermann and colleagues describe AMIE, an LLM-based system optimized for clinical management and conversations. The model can perform continuous reasoning over multiple patient visits to map the progression of disease and responses to treatment. AMIE uses Gemini to analyse the information retrieved from the patient and align its output with relevant and up-to-date clinical practice guidelines and drug formularies (lists of approved, clinically preferred medications). In a virtual clinical examination study, AMIE was compared to 21 primary care physicians across 100 multi-visit case scenarios and five medical specialities, designed to reflect UK NICE guidance and BMJ Best Practice guidelines. AMIE performed as well as real physicians in management reasoning capabilities, and better than physicians in preciseness of treatments and investigations and in its alignment with clinical guidelines and grounding of management plans in those guidelines. On a newly introduced benchmark for medication reasoning (RxQA), AMIE outperformed physicians on difficult cases. The authors note that more work is needed before AMIE is ready for clinical care but conclude that this work represents a step towards the use of conversational AI tools to assist physicians in disease management.

Attachments

Note: Not all attachments are visible to the general public. Research URLs will go live after the embargo ends.

Research Springer Nature, Web page Paper 1. The URL will go live after the embargo ends
Research Springer Nature, Web page Paper 2. The URL will go live after the embargo ends
Journal/
conference:
Nature
Research: Link to Paper 1 | Paper 2
Organisation/s: Heidelberg University Hospital, Germany, Google Research, USA
Funder: Paper 1: J.N.K. is supported by the German Cancer Aid (DECADE, 70115166), the German Federal Ministry of Education and Research (PEARL, 01KD2104C; CAMINO, 01EO2101; TRANSFORM LIVER, 031L0312A; TANGERINE, 01KT2302 through ERA-NET Transcan; Come2Data, 16DKZ2044A; DEEP-HCC, 031L0315A; DECIPHER-M, 01KD2420A; NextBIG, 01ZU2402A), the German Academic Exchange Service (SECAI, 57616814), the German Federal Joint Committee (TransplantKI, 01VSF21048) the European Union’s Horizon Europe and innovation programme (ODELIA, 101057091; GENIAL, 101096312), the European Research Council (ERC; NADIR, 101114631), the National Institutes of Health (EPICO, R01 CA263318) and the National Institute for Health and Care Research (NIHR, NIHR203331) Leeds Biomedical Research Centre. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. This work was funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them. J.C. is supported by the Mildred-Scheel-Postdoktorandenprogramm of the German Cancer Aid (grant #70115730). D.T. is funded by the German Federal Ministry of Education and Research (TRANSFORM LIVER, 031L0312A), the European Union’s Horizon Europe and innovation programme (ODELIA, 101057091), and the German Federal Ministry of Health (SWAG, 01KD2215B). D.F. has received a research grant from OpenAI. G.W. is supported by Lothian NHS. Paper 2: This study was funded by Alphabet Inc and/or a subsidiary thereof (‘Alphabet’).
Media Contact/s
Contact details are only visible to registered journalists.