Monday, June 8, 2026
No menu items!
HomeNatureBots are scraping open data — how should researchers respond?

Bots are scraping open data — how should researchers respond?

A 3D illustration showing nine, small, angry looking, box robots sat at computers.

90% of open access data repositories part of the Confederation of Open Access Repositories encounter bot scraping,Credit: fdmsd8yea/Getty

Should researchers still be posting their data openly online? It’s a question being debated by some researchers now that bots are routinely mining open-access databases and scientific publications to train artificial-intelligence tools — and in some cases analysing and combining data sets to churn out new results and papers faster than humans can.

Some researchers argue that the potential of automated science to be used for scientific ‘good’ — speeding up the discovery of new drug targets, for example — means that open data should remain open. But others point to evidence that bots scraping complex data sets can contribute to low-quality research and AI slop, while also allowing the extraction of sensitive data, including patient information. They argue that new rules and technical systems are needed to restrict bot access to databases.

“It’s a pretty big issue everybody should be thinking about, whether you’re for or against AI,” says Andrea Howard, a psychologist at Carleton University in Ottawa, Canada.

Privacy concerns

What is clear is that AI scraping is common. A survey published in June last year by the Confederation of Open Access Repositories found that more than 90% of the member organizations that responded encounter bot scraping, with most of them seeing abnormally high bot activity at least once a week1. Often, that scraping is done to provide training data for AI models. Those data are also being used to produce new research outputs that are generated entirely by artificial-intelligence models.

“The scope and speed of how quickly automated pipelines can exhaust the research questions a data set can answer feels like a big change,” says Miri Forbes, a quantitative psychopathologist at Macquarie University in Sydney, Australia. “It shrinks the space left to work in a given data set.”

Last month, Forbes kicked off a discussion about open data sharing on the social-media platform Bluesky. The responses were divided. “Sharing information freely means ceding control and accepting that it may be used for any purpose, including those I don’t like,” responded one user on Bluesky. “It’s not your data anyway,” posted another.

Other people were less sanguine, pointing to a need for additional safeguards. “As a scientific community we need to solve this. We can’t have people fearing being scooped by AI,” posted one user.

Further concerns included that AI tools don’t always credit and cite researchers’ data in the same way that human researchers do, and that bots seem to be bypassing privacy protections and scraping sensitive personal data.

Olivia Kirtley, co-director of the Center for Contextual Psychiatry at KU Leuven in Belgium, conducts studies that involve people who experience suicidal ideation or who self-harm. “Participants could be put at risk through re-identification, sensitive data could be used for purposes for which it wasn’t intended or for which participants haven’t given consent,” she says. One study found that publicly available large-language models could identify around one-quarter of people who had taken part in an interview-based project investigating people’s views on AI tools and whose personal details had been anonymized2.

Controlled access

RELATED ARTICLES

Most Popular

Recent Comments