Friday, February 6, 2026
No menu items!
HomeNatureCheap AI chatbots transform medical diagnoses in places with limited care

Cheap AI chatbots transform medical diagnoses in places with limited care

An elderly man holds a child while a doctor uses a stethoscope to listen to their chest in a clinic on the outskirts of Hyderabad.

LLMs greatly improved physicians’ diagnostic accuracy.Credit: Rizwan Tabassum/AFP via Getty

Large language models (LLMs) can pass postgraduate medical examinations and help clinicians to make diagnoses, at least in controlled benchmarking tests. But are they useful in real-world settings, which have too few physicians to check the answers, as well as long patient lists and limited resources?

Two studies published in Nature Health on 6 February suggest that they are up to the task. The work reveals that cheap-to-use LLMs can boost diagnostic success rates, even outperforming trained clinicians, in health-care settings in Rwanda1 and Pakistan2.

In Rwanda, chatbot answers outscored those of local clinicians across every metric assessed. And in Pakistan, physicians using LLMs to aid their diagnosis achieved a mean diagnostic reasoning score of 71%, versus 43% for those using conventional resources.

“The papers highlight how LLMs might be able to support clinicians in lower- and middle-income countries to improve the level of care,” says Caroline Green, director of research at the Institute for Ethics in AI at the University of Oxford, UK.

Real-world complexity

In the Rwanda study, researchers tested whether LLMs could give accurate clinical information to patients in low-resource health systems across four districts. A common problem there is that there are too few doctors and nurses to see all patients, so most people are seen and triaged by community workers with little training, says study co-author Bilal Mateen, the London-based chief AI officer at PATH, a global non-profit organization that is dedicated to health equity.

Mateen’s team asked about 100 community health workers to compile a list of more than 5,600 clinical questions they tend to receive from patients.

The researchers compared the responses generated by five LLMs to roughly 500 of these questions against answers from trained local clinicians. Grading the responses on a 5-point scale revealed that all the LLMs outperformed local clinicians across all 11 metrics, which included alignment with established medical consensus, understanding the question and the likelihood of the response leading to harm. The team also demonstrated that the LLMs could answer roughly 100 questions in Kinyarwanda, the national language of Rwanda.

Mateen says that LLMs have another advantage: that they are available for consultation by a community health worker 24/7, which isn’t the case for physicians. LLMs were also 500 times cheaper per response — clinician-generated answers cost an average of US$5.43 for doctors and $3.80 for nurses, whereas LLM responses cost $0.0035 in English and $0.0044 in Kinyarwanda.

This study “suggests that commercial LLMs are able to give medically and culturally appropriate responses to common queries”, says Adam Rodman, a clinical and AI researcher at Beth Israel Deaconess Medical Center in Boston, Massachusetts.

However, Rodman remains sceptical about comparing LLMs to human performance. This sort of evaluation mechanism of written answers is good at measuring models, he says, but less so human performance.

Diagnostic accuracy

In Pakistan, researchers led by Ihsan Qazi, a computer scientist at the Lahore University of Management Sciences, found that LLMs can boost diagnostic accuracy in low-resource health-care settings2. There, says, Qazi, a paucity of medical specialists and enormous patient loads cause a high number of diagnostic errors.

Qazi’s team conducted a randomized controlled trial in which 58 licensed physicians received 20 hours of training in how to use LLMs to assist with diagnosing patients’ symptoms and to be wary of mistakes or hallucinations made by the programs.

Physicians who had access to the GPT-4o LLM had substantially improved diagnostic accuracy ratings when reviewing clinical cases compared with those using only PubMed and Internet searches. Physicians with access to LLMs achieved a mean diagnostic reasoning score of 71%, whereas those using conventional resources achieved 43%.

A health worker educates patients on their arrival at the Mpox treatment centre at Nyiragongo general referral hospital, north of the town of Goma, Democratic Republic of Congo.

AI could help doctors and nurses see and triage more patients in clinics with limited resources.Credit: Guerchom Ndebo/AFP via Getty

A secondary analysis found that an LLM alone achieved better scores than did physicians assisted by an LLM. However, there were exceptions. In 31% of cases, the physicians did better than the median lone AI performance. “It turned out that these cases involved red flags, contextual factors, which the LLM seems to have missed,” says Qazi.

Qazi expects his results to be applicable to other countries, but says they need to be replicated using other chatbots. “This work opens up new avenues that can eventually lead to more safe and effective integration of AI and health care,” he says.

RELATED ARTICLES

Most Popular

Recent Comments