Artificial intelligence (AI) can make better decisions than doctors in hospital emergency services when patient information is incomplete and decisions that need to be made quickly can be life or death, according to research led by Harvard Medical School published today in the journal Science.
Read more Elections in Andalusia 2026: who is who among the candidates aspiring to govern the Junta
The doctors and bioinformaticians who conducted the study advocate integrating AI into clinical decision-making to improve patient care, but always under human responsibility and control.
“Our results do not mean that AI will replace doctors. But we are facing a major change that will transform medicine,” said Arjun Manrai, bioinformatician at Harvard Medical School and co-director of the work, at a press conference on Tuesday.
The researchers evaluated OpenAI’s system 01 in the types of situations that arise in emergency rooms: patient triage upon arrival at the hospital; reasoning to specify the diagnosis; deciding which medical tests to perform; and deciding where to refer patients when they leave the emergency room.
They did this by presenting it with complex clinical cases as if it were a medical exam. They compared the results of system 01 with those of doctors from Harvard and Stanford universities who evaluated the same cases, as well as with the GPT-4o system (an evolved version of the popular GPT-4). In all situations, system 01 outperformed GPT-4o, the doctors, and even doctors who used GPT-4 as an aid.
In the definitive test demonstrating AI’s potential in emergencies, the researchers presented real cases of 76 patients who went to Beth Israel Deaconess Hospital in Boston, affiliated with Harvard Medical School. “We input the raw, unedited text, with the countless distractors and all the random noise that clinical cases come with. With the confusing data of the real world, 01 also outperforms doctors,” said Peter Brodeur, an internist at the Boston hospital and co-first author of the research.
System 01 made the correct diagnosis at the initial triage in 67% of cases, compared to an accuracy rate between 50% and 55% for doctors. In the following hours, with the results of more tests, 01 raised its diagnostic accuracy to 82% and doctors to 70-79%.
(The text continues after the description of the real cases)
Three real cases
The nurse with a sugar drop. A man, a nurse by profession, presented to the emergency room at Beth Israel Deaconess Hospital in Boston with a case of hypoglycemia (low blood sugar). Initially, doctors suspected a diabetes problem. AI detected that it was due to a rare type of cancer affecting the pancreas.
The false lead of anticoagulants. A man arrived at the hospital with pulmonary thromboembolism, a serious disorder in which a blood clot blocks a pulmonary artery. He received anticoagulant treatment and initially improved, but soon after worsened again. Doctors mistakenly attributed it to treatment failure. AI noted that the patient had a history of lupus and correctly attributed the inflammation of the lungs and heart to the autoimmune disease.
From the lungs to the scrotum. A patient arrived at the emergency room with generic infection symptoms. He had received a transplant and was taking immunosuppressive medication, so it was not unusual for him to suffer an infection and for his immune system to be unable to control it. He mainly complained of respiratory symptoms, which led to suspicion of a lung infection. He had also reported pain in the scrotum, which had not been given much importance. AI correctly diagnosed, before the doctors, that he was suffering from a necrotizing infection of the scrotum that required surgery.
Read more Infantino: “Of course, Iran is going to play in the United States”
“These models could be one of the most impactful technologies [in medicine] in decades,” said Adam Rodman, an internist at Beth Israel Deaconess Hospital and co-director of the research, at the press conference. But he warned that they are not yet ready for large-scale use.
First, prospective clinical trials will be necessary to verify whether incorporating AI into clinical decision-making is beneficial for patients, the authors of the research emphasize. Additionally, “health systems need to prepare to invest in computing infrastructure and to (…) facilitate the safe integration of AI tools into patient care,” they write in Science.
An additional problem to be addressed in the coming years, Peter Brodeur added at the press conference, is that “these models are extremely capable, but doctors do not know how to get the most out of them.”
The researchers observed that system 01 not only is more accurate than doctors but also reasons better. “We evaluated diagnostic reasoning because sometimes one arrives at the correct diagnosis for the wrong reasons,” explained Brodeur. The evaluation was done with cases extracted from the NEJM Healer database, created by the journal The New England Journal of Medicine as part of its educational and clinical reasoning content. System 01 demonstrated optimal reasoning in 97.5% of cases, a much higher percentage than the 35% achieved by attending physicians and the 22% of resident doctors. And this despite the fact that the model “was not trained for medical reasoning but to predict the next word,” notes Arjun Manrai, co-director of the research.
The AI system has been evaluated only with text documents, so the researchers expect its performance to improve when it also incorporates images and sounds in its evaluations as doctors do, who take into account their patients’ physical appearance or the sound of their breathing, among other sensory signals.
Still, clinical decisions belong to doctors and not AI, since AI can have hallucinations and there are no mechanisms to hold it accountable for its errors. “We cannot expect doctors to detect hallucinations because AI models are equally convincing when they are right as when they are wrong; there must be humans in the decision circuit as a safety mechanism,” said Adam Rodman, for whom “AI has to be an extension of the doctor, not a substitute.”
Read more Trump finds no solution in Hormuz as oil spikes in the U.S.