AI outperforms doctors in Harvard emergency triage trial
A groundbreaking Harvard Medical School study published in Science found that AI systems outperformed human doctors in high-pressure emergency medicine triage — the critical moments when patients first arrive at a hospital.
How the trial worked
The experiment focused on 76 patients who arrived at a Boston hospital emergency room. An AI model (OpenAI’s o1 reasoning model) and a pair of human doctors each received the same standard electronic health record — vital signs, demographics, and a few sentences from a nurse describing why the patient was there.
Key results
- Initial triage: AI identified the exact or very close diagnosis in 67% of cases, beating human doctors who were right only 50%-55% of the time.
- With more detail available: AI accuracy rose to 82%, compared to 70%-79% for expert humans (though this difference was not statistically significant).
- Long-term treatment plans: AI scored 89% versus 34% for humans using conventional tools like search engines.
The AI’s advantage was especially pronounced in triage scenarios requiring rapid decisions with minimal information.
A telling case
One patient presented with a blood clot to the lungs and worsening symptoms. Human doctors thought the anti-coagulants were failing. The AI noticed something they missed: the patient’s history of lupus meant the condition might be causing lung inflammation. The AI was proved correct.
What it means
Arjun Manrai, a lead author at Harvard Medical School, noted that AI won’t replace doctors — but “we’re witnessing a really profound change in technology that will reshape medicine.”
Dr. Adam Rodman, another lead author and a practicing physician, predicted a new “triadic care model” over the next decade — the doctor, the patient, and an AI system working together. Patients, he stressed, “want humans to guide them through life or death decisions and through challenging treatment decisions.”
The study only tested AI against text-based patient data. It did not account for visual cues doctors observe — a patient’s distress level, appearance, or physical signals. The AI essentially performed like a clinician offering a second opinion based on paperwork.
Prof. Ewen Harrison from the University of Edinburgh said the study showed “these systems are no longer just passing medical exams or solving artificial test cases. They are starting to look like useful second-opinion tools for clinicians.”
The adoption reality
Nearly one in five US physicians already use AI to assist diagnosis. In the UK, 16% of doctors use AI daily and 15% weekly, with clinical decision-making being one of the most common applications.
However, the lack of a formal accountability framework for AI errors remains a major concern. Dr. Wei Xing from the University of Sheffield also warned that some findings suggested doctors may unconsciously defer to AI answers rather than thinking independently — a tendency that could grow as AI becomes more routine in clinical settings.