Wednesday, August 20, 2025
No menu items!
HomeNatureWhat counts as plagiarism? AI-generated papers pose new risks

What counts as plagiarism? AI-generated papers pose new risks

This January, Byeongjun Park, a researcher in artificial intelligence (AI), received a surprising e-mail. Two researchers from India told him that an AI-generated manuscript had used methods from one of his papers, without credit.

Park looked up the manuscript. It wasn’t formally published, but had been posted online (see go.nature.com/45pdgqb) as one of a number of papers generated by a tool called The AI Scientist — announced in 2024 by researchers at Sakana AI, a company in Tokyo1.

The AI Scientist is an example of fully automated research in computer science. The tool uses a large language model (LLM) to generate ideas, writes and runs the code by itself, and then writes up the results as a research paper — clearly marked as AI-generated. It’s the start of an effort to have AI systems make their own research discoveries, says the team behind it.

The AI-generated work wasn’t copying his paper directly, Park saw. It proposed a new architecture for diffusion models, the sorts of model behind image-generating tools. Park’s paper dealt with improving how those models are trained2. But to his eyes, the two did share similar methods. “I was surprised by how closely the core methodology resembled that of my paper,” says Park, who works at the Korea Advanced Institute of Science and Technology (KAIST) in Daejeon, South Korea.

The researchers who e-mailed Park, Tarun Gupta and Danish Pruthi, are computer scientists at the Indian Institute of Science in Bengaluru. They say that the issue is bigger than just his paper.

In February, Gupta and Pruthi reported3 that they’d found multiple examples of AI-generated manuscripts that, according to external experts they consulted, used others’ ideas without attribution, although without directly copying words and sentences.

Gupta and Pruthi say that this amounts to the software tools plagiarizing other ideas — albeit with no ill intention on the part of their creators. “A significant portion of LLM-generated research ideas appear novel on the surface but are actually skillfully plagiarized in ways that make their originality difficult to verify,” they write.

In July, their work won an ‘outstanding paper’ award at the Association for Computational Linguistics conference in Vienna.

But some of their findings are disputed. The team behind The AI Scientist told Nature that it strongly disagrees with Gupta and Pruthi’s findings, and doesn’t accept that any plagiarism occurred in The AI Scientist case studies that the paper examines. In Park’s specific case, one independent specialist told Nature that he thought the AI manuscript’s methods didn’t overlap enough with Park’s paper to be termed plagiarism. Park himself also demurred at using ‘plagiarism’ to describe what he saw as a strong methodological overlap.

Beyond the specific debate about The AI Scientist lies a broader concern. So many papers are published each year — especially in computer science — that researchers already struggle to keep track of whether their ideas are really innovative, says Joeran Beel, a specialist in machine-learning and information science at the University of Siegen, Germany.

And if more LLM-based tools are used to generate ideas, this could deepen the erosion of intellectual credit in science. Because LLMs work in part by remixing and interpolating the text they’re trained on, it would be natural for them to borrow from earlier work, says Parshin Shojaee, a computer scientist at the Virginia Tech Research Center — Arlington.

The issue of ‘idea plagiarism’, although little discussed, is already a problem with human-authored papers, says Debora Weber-Wulff, a plagiarism researcher at the University of Applied Sciences, Berlin, and she expects that it will get worse with work created by AI. But, unlike the more familiar forms of plagiarism — involving copied or subtly rewritten sentences — it’s hard to prove the reuse of ideas, she says.

That makes it difficult to see how to automate the task of checking for true novelty or originality, to match the pace at which AIs are going to be able to synthesize manuscripts.

“There’s no one way to prove idea plagiarism,” Weber-Wulff says.

Overlapping methods

Bad actors can, of course, already use AI to deliberately plagiarize others or rewrite others’ work to pass it off as their own (see Nature https://doi.org/gt5rjz; 2025). But Gupta and Pruthi wondered if well-intentioned AI approaches might be using others’ methods or ideas too.

Gupta and Pruthi were first alerted to the issue when they read a 2024 study led by Chenglei Si, a computer scientist at Stanford University in California4. Si’s team asked both people and LLMs to generate “novel research ideas” on topics in computer science. Although Si’s protocol included a novelty check and asked human reviewers to assess the ideas, Gupta and Pruthi argue that some of the AI-generated ideas produced by the protocol nevertheless lifted from existing works — and so weren’t ‘novel’ at all.

They picked out one of the AI-generated ideas in Si’s paper, which they say borrowed from a paper first posted as a preprint5 in 2023. Si tells Nature that he agrees that the ‘high-level’ idea was similar to material in the preprint, but that “whether the low-level implementation differences count as novelty is probably a subjective judgement”. Shubhendu Trivedi, a machine-learning researcher who co-authored that 2023 preprint, and was until recently at the Massachusetts Institute of Technology in Cambridge, says that “the LLM-generated paper was basically very similar to our paper, despite some superficial-level differences”.

Gupta and Pruthi further tested their concern by taking the four AI-generated research proposals publicly released by Si’s team and the ten AI manuscripts released by Sakana AI, and generated 36 fresh proposals themselves, using Si’s methodology. They then asked 13 specialists to try to find overlaps in methods between the AI-made works and existing papers, using a 5-point scale, on which 5 corresponded to a ‘one-to-one mapping in methods’ and 4 to ‘mix-and-match from two-to-three prior works’; 3 and 2 represented more-modest overlaps and 1 indicated no overlap. “It’s essentially about copying of the idea or crux of the paper,” says Gupta.

The researchers also asked the authors of original papers identified by the specialists to give their own views on the overlaps.

Including this step, Gupta and Pruthi report that 12 papers in their sample of AI-generated works reached levels 4 and 5, implying, they said, a plagiarism proportion of 24%; the figure rises to 18 (36%) if cases in which the original authors didn’t reply are included. Some were from Sakana’s and Si’s work, although Gupta and Pruthi discuss in detail only the examples reported in this story.

They also said they’d found a similar kind of overlap in an AI-generated manuscript (see go.nature.com/4oym4ru) that, Sakana announced this March, had passed through a stage of peer review for a workshop at a prestigious machine-learning conference, the International Conference on Learning Representations.

At the time, the firm said that this was the first fully-AI-generated paper to pass human peer review. It also explained that it had agreed with workshop organizers to trial putting AI-generated papers into peer review and to withdraw them if they were accepted, because the community hadn’t yet decided whether AI-generated papers should be published in conference proceedings. (The workshop organizers declined Nature’s request for comment.)

Gupta and Pruthi say that this paper borrowed its core contribution from a 2015 work6, without citing it. Their report quotes the authors of that paper, computer scientists David Krueger and Roland Memisevic, as saying that the Sakana work is “definitively not novel”, and identifying a second uncited manuscript7 that the paper borrowed from.

Another computer scientist, Radu Ionescu at the University of Bucharest, told Nature he rated the similarity between the AI-generated work and Krueger and Memisevic’s paper as a 5.

Krueger, who is at the University of Montreal in Canada, told Nature that the related works should have been cited, but that he “wouldn’t be surprised to see human researchers reinvent this and miss previous work” too. “I think this AI system and others are not capable of achieving academic standards for referencing related work,” he said, adding that the AI paper was “extremely low quality overall”. But he wasn’t sure whether the word plagiarism should be applied, because he feels that term implies that the person (or AI tool) reusing methods was aware of earlier work, but chose not to cite it.

Pushback

The team behind The AI Scientist, which includes researchers at the University of Oxford, UK, and the University of British Columbia in Vancouver, Canada, pushed back strongly against Gupta and Pruthi’s work when asked by Nature. “The plagiarism claims are false,” the team wrote in an e-mailed point-by-point critique, adding that they were “unfounded, inaccurate, extreme, and should be ignored”.

On two AI Scientist manuscripts discussed in Gupta and Pruthi’s paper, for instance, the team says that these works have different hypotheses from those in the earlier papers and apply them to different domains, even if some elements of the methods are related.

The references found by the specialists for Gupta and Pruthi’s analysis are work that the AI-generated papers could have cited, but nothing more, the AI Scientist team says, adding: “What they should have reported is some related work that went uncited (a daily occurrence by human authors).” The team says it would be “appropriate” to have cited Park’s paper. In the case of Krueger’s paper and the second uncited manuscript, the AI Scientist team says, “these two papers are related, so, while it is an everyday occurrence by humans not to include works like this, it would have been good for The AI Scientist to cite them”.

Ben Hoover, a machine-learning researcher at the Georgia Institute of Technology in Atlanta who specializes in diffusion models, told Nature that he’d score the overlap with Park’s paper as a ‘3’ on Gupta’s scale. He said the AI-generated paper is of much lower quality and less thorough than Park’s work, and should have cited it, but “I would not go so far as to say plagiarism.” Gupta and Pruthi’s analysis relies on ‘superficial similarities’ between generic statements in the AI-generated work that, when read in detail, don’t meaningfully map to Park’s paper, he adds. Ionescu told Nature he would give the AI-generated paper a rating of 2 or 3.

Park judges the overlap with his paper to be much stronger than Hoover’s and Ionescu’s ratings. He says he would give it a score of 5 on Gupta’s scale, and adds that it “reflects a strong methodological resemblance that I consider noteworthy.” Even so, this does not necessarily align with what he sees as the legal or ethical definition of plagiarism, he told Nature.

What counts as plagiarism

Part of the disagreement could stem from different operational understandings of what ‘plagiarism’ means, especially when it comes to overlap in ideas or methods. Researchers who study plagiarism hold different views on the term from those of some of the computer scientists in the current debate, says Weber-Wulff.

“Plagiarism is a word we should and do reserve for extreme cases of intentional fraudulent cheating,” the AI Scientist team wrote, adding that Gupta and Pruthi “are wildly out of line with established conventions regarding what counts as plagiarism in academia”. But Weber-Wulff disagrees: she says that intent shouldn’t be a factor. “The machine has no intent,” she says. “We don’t have a good mechanism for explaining why the system is saying something and where it got it from, because these systems are not built to give references.”

Weber-Wulff’s own favoured definition of plagiarism is that it occurs when a manuscript “uses words, ideas, or work products attributable to another identifiable person or source without properly attributing the work to the source from which it was obtained in a situation in which there is a legitimate expectation of original authorship”. That definition was produced by Teddi Fishman, the former director of a US non-profit consortium of universities called the International Center for Academic Integrity.

RELATED ARTICLES

Most Popular

Recent Comments