In Harvard study, AI offered more accurate diagnoses than emergency room doctors
A Harvard Medical School study found that OpenAI's o1 and 4o models provided more accurate or comparable diagnoses to emergency room physicians in real patient cases. The AI models outperformed doctors especially during initial triage, when limited patient information is available. Researchers emphasized the need for further real-world trials and noted current limitations in AI's ability to process non-text medical data.
- ▪The study compared AI-generated diagnoses from OpenAI’s o1 and 4o models with those of two attending physicians using 76 real ER cases from Beth Israel Deaconess Medical Center.
- ▪At the initial triage stage, the o1 model provided exact or very close diagnoses in 67% of cases, compared to 55% and 50% for the two physicians.
- ▪Diagnoses were evaluated by two other physicians who were blinded to whether the diagnosis came from a human or AI.
- ▪The researchers used raw electronic medical record data without pre-processing for the AI models.
- ▪The study calls for prospective trials to assess AI in real-world clinical settings and highlights the lack of accountability frameworks for AI in medical diagnosis.
Opening excerpt (first ~120 words) tap to expand
A new study examines how large language models perform in a variety of medical contexts, including real emergency room cases — where at least one model seemed to be more accurate than human doctors. The study was published this week in Science and comes from a research team led by physicians and computer scientists at Harvard Medical School and Beth Israel Deaconess Medical Center. The researchers said they conducted a variety of experiments to measure how OpenAI’s models compared to human physicians. In one experiment, researchers focused on 76 patients who came into the Beth Israel emergency room, comparing the diagnoses offered by two attending physicians to those generated by OpenAI’s o1 and 4o models.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at TechCrunch.