Tuesday, May 26, 2026
No menu items!
HomeNatureWhy AI can’t be trusted to write scientific reviews

Why AI can’t be trusted to write scientific reviews

Artificial-intelligence tools are being touted as a means to conduct rapid reviews of scientific literature. At the London-based publisher the Cochrane Collaboration, where I became editor-in-chief in March, we specialize in health-related systematic reviews: the highest-quality syntheses of all the available research. We are testing ways to use AI to increase our reviews’ efficiency and scale. But, in our experience, the current tools are far from ready for mainstream adoption, and the assumption that machines can replace humans on all methodological tasks is flawed.

The stakes are high. Systematic reviews and other types of evidence synthesis inform clinical practice, public-health guidance and policy decisions that affect entire populations. Errors could give false hope to patients or lead health systems to waste money on ineffective or unsafe interventions.

Current AI models typically replicate the step-by-step processes by which people conduct systematic reviews: identifying suitable studies from disparate sources; extracting relevant data for analysis; and, finally, writing up the report. The idea is to replace the work of humans.

But conducting systematic reviews is not a purely computational task. Human specialists are needed to define meaningful review questions, evaluate relevance, interpret results and understand clinical or policy implications. Context and subjective nuance are seldom well-represented in AI models’ training data, and the models’ tendency to hallucinate — that is, to fabricate information — means that their outputs need to be verified by human experts.

Efforts at Cochrane show the limitations of using AI in place of people. We’ve been evaluating tools that support study screening and data extraction. These are time-consuming processes to conduct manually, particularly when primary data are not readily accessible and must be drawn from multiple sources or inferred from published results.

But we’ve found that most of the tools available were developed by private companies. This is problematic for reviews that evaluate drugs and medical devices, because these need to be independent of industry. What’s more, few AI models are open source, with most relying on opaque, proprietary ‘black box’ processes. This means there’s no way to examine whether a tool might disproportionately include trials with results favourable to one drug company.

And, on the practical side, tools in the current generation require long training periods for both the AI and the human operator before they yield reliable results. So far, we’ve found that, for each review, the whole process takes longer than doing the work manually.

In my view, to realize the full potential of AI, it’s crucial that tool developers, authors and evidence users move away from using it to generate individual reviews. Instead of mimicking human processes, developers should start building systems that allow humans and AI to work together effectively to assess studies.

RELATED ARTICLES

Most Popular

Recent Comments