
The reference integrity audit of 2.5 million biomedical papers spans 3 years of scientific publishing.Credit: Aramyan/Getty
An audit of 2.5 million academic papers has identified nearly 3,000 biomedical-science papers that contain fake references — ones that could not be traced to known publications.
The findings, published in The Lancet on 7 May1, are contained in the first academic study to estimate the scale of fake citations in the biomedical literature.
The authors designed an automated pipeline to screen papers from PubMed Central — a database of publicly accessible biomedical articles — published between January 2023 and February 2026.
Their work suggests that the contamination of papers with fake citations is a rapidly growing problem in biomedicine. There were 12 times more publications with fabricated citations in 2025 compared with 2023 (see ‘Fabricated references on the rise’).

Source: Ref. 1
The findings are “conservative underestimates”, says study co-author Maxim Topaz, an AI researcher at Columbia University in New York. “What we identified is the lower bound of true prevalence. We’re scratching the tip of the iceberg,” he adds.
Kathryn Weber-Boer, director of scientometrics at the London-based company Digital Science, agrees. The study is a “solid first initial contribution to the problem”, she says. (Digital Science is operated by Holtzbrinck Publishing Group, the majority shareholder of Springer Nature, which publishes Nature. Nature’s news team is editorially independent of its publisher.)
A Nature analysis published in April estimated that around 1.6% of publications from 2025 contained at least one reference corresponding to a publication which did not seem to exist.
Reference mismatches
In their study, Topaz and his colleagues developed a system to inspect the 125.6 million references cited by 2.5 million papers. They focused the analysis on 97 million references that had valid Digital Object Identifiers (DOIs) — unique strings of letters and numbers assigned by publishers and preprint repositories — or an ID assigned by the database PubMed.
They used large language models (LLMs) to flag mismatches between the article title in each reference and the title of the paper that its DOI or PubMed ID led to. They also searched for the references across four scholarly databases: PubMed, Crossref, OpenAlex and Google Scholar. If the title of a reference did not appear in any of these databases, the team considered it to be fabricated.
Hallucinated citations are polluting the scientific literature. What can be done?
The analysis found 2,564 papers that contained one or two fabricated references, and 246 papers that contained three or more.
“Whether they’re fabricated by a computer or fabricated by a human being, that’s a question that remains open,” says Weber-Boer. But she adds that “the growth in the problem suggests that there is a generative AI component”.
In a manual check of 500 flagged references, three independent reviewers confirmed that the citations were fabricated in seven out of ten cases.
However, the analysis probably underestimates the total number of papers that include fake citations. “Google Scholar is not a reliable source” to verify references, notes Weber-Boer, because some fabricated references do appear on the site, but don’t trace back to genuine publications.


