Wednesday, May 21, 2025
No menu items!
HomeNatureAI linked to explosion of low-quality biomedical research papers

AI linked to explosion of low-quality biomedical research papers

Close up of a shelf holding medical records in colourful folders.

Health data from thousands of people is publicly available and ready to plug into AI systems for analysis.Credit: BSIP/UIG Via Getty

The scientific literature is at risk of becoming flooded with papers that make misleading health claims based on openly available data that are easy to process using artificial intelligence (AI) tools, researchers have warned.

In a study published in PLoS Biology on 8 May1, scientists analysed more than 300 papers that used data from the US National Health and Nutrition Examination Survey (NHANES), an open data set of health records. The papers all seemed to follow a similar template, associating one variable — for example, vitamin D levels or sleep quality — with a complex disorder such as depression or heart disease, ignoring the fact that these conditions have many contributing factors.

“We have a sudden explosion in publication rates [of papers] that are extremely formulaic that could easily have been generated by large language models,” says study co-author Matt Spick, a biomedical scientist at the University of Surrey in Guildford, UK.

Spick and his colleagues found that the associations in many of the papers did not hold up to statistical scrutiny, and that some studies seemed to have cherry-picked data.

“Imagine you’re trying to pass an exam that has a particular pass rate, and you add as many questions as you want. You see which ones you got right, and you remove the ones that you got wrong. That’s basically what they’re doing,” explains Charlie Harrison, a computational biologist at Aberystwyth University in Ceredigion, UK, who also worked on the study.

Ioana Alina Cristea, a clinical psychologist and meta-researcher at the University of Padua, Italy, agrees that the papers “seem to be written with a recipe”.

“We need these systematic evaluations to get some way to gauge the extent of the problem,” she says.

Surge in studies

NHANES is a long-running survey that collects data from thousands of people in the United States about their health, diet and lifestyle. The data set is publicly available and ready to plug into coding or AI systems for analysis, which has led to an increase in studies based on NHANES data over the past two years, Spick says. In 2024 alone, more than 2,200 association studies using NHANES data were published, and more than 1,200 have been published so far this year, according to the PubMed index of biomedical literature.

Harrison, Spick and their colleagues focused on a sample of 341 studies published between 2014 and 2024 that were based on NHANES data. The papers appeared in 147 journals produced by a range of publishers, including Frontiers Media, Elsevier and Springer Nature (Nature’s news team is editorially independent of its publisher).

RELATED ARTICLES

Most Popular

Recent Comments