Timothée Poisot, a computational ecologist at the University of Montreal in Canada, has made a successful career out of studying the world’s biodiversity. A guiding principle for his research is that it must be useful, Poisot says, as he hopes it will be later this year, when it joins other work being considered at the 16th Conference of the Parties (COP16) to the United Nations Convention on Biological Diversity in Cali, Colombia. “Every piece of science we produce that is looked at by policymakers and stakeholders is both exciting and a little terrifying, since there are real stakes to it,” he says.
But Poisot worries that artificial intelligence (AI) will interfere with the relationship between science and policy in the future. Chatbots such as Microsoft’s Bing, Google’s Gemini and ChatGPT, made by tech firm OpenAI in San Francisco, California, were trained using a corpus of data scraped from the Internet — which probably includes Poisot’s work. But because chatbots don’t often cite the original content in their outputs, authors are stripped of the ability to understand how their work is used and to check the credibility of the AI’s statements. It seems, Poisot says, that unvetted claims produced by chatbots are likely to make their way into consequential meetings such as COP16, where they risk drowning out solid science.
Nature Career Guide: Faculty
“There’s an expectation that the research and synthesis is being done transparently, but if we start outsourcing those processes to an AI, there’s no way to know who did what and where the information is coming from and who should be credited,” he says.
Since ChatGPT’s arrival in November 2022, it seems that there’s no part of the research process that chatbots haven’t touched. Generative AI (genAI) tools can now perform literature searches; write manuscripts, grant applications and peer-review comments; and even produce computer code. Yet, because the tools are trained on huge data sets — that often are not made public — these digital helpers can also clash with ownership, plagiarism and privacy standards in unexpected ways that cannot be addressed under current legal frameworks. And as genAI, overseen mostly by private companies, increasingly enters the public domain, the onus is often on users to ensure that they are using the tools responsibly.
Bot bounty
The technology underlying genAI, which was first developed at public institutions in the 1960s, has now been taken over by private companies, which usually have no incentive to prioritize transparency or open access. As a result, the inner mechanics of genAI chatbots are almost always a black box — a series of algorithms that aren’t fully understood, even by their creators — and attribution of sources is often scrubbed from the output. This makes it nearly impossible to know exactly what has gone into a model’s answer to a prompt. Organizations such as OpenAI have so far asked users to ensure that outputs used in other work do not violate laws, including intellectual-property and copyright regulations, or divulge sensitive information, such as a person’s location, gender, age, ethnicity or contact information. Studies have shown that genAI tools might do both1,2.
Chatbots are powerful in part because they have learnt from nearly all the information on the Internet — obtained through licensing agreements with publishers such as the Associated Press and social-media platforms including Reddit, or through broad trawls of freely accessible content — and they excel at identifying patterns in mountains of data. For example, the GPT-3.5 model, which underlies one version of ChatGPT, was trained on roughly 300 billion words, which it uses to create strings of text on the basis of predictive algorithms.
AI companies are increasingly interested in developing products marketed to academics. Several have released AI-powered search engines. In May, OpenAI announced ChatGPT Edu, a platform that layers extra analytical capabilities onto the company’s popular chatbot and includes the ability to build custom versions of ChatGPT.
Two studies this year have found evidence of widespread genAI use to write both published scientific manuscripts3 and peer-review comments4, even as publishers attempt to place guardrails around the use of AI by either banning it or asking writers to disclose whether and when AI is used. Legal scholars and researchers who spoke to Nature made it clear that, when academics use chatbots in this way, they open themselves up to risks that they might not fully anticipate or understand. “People who are using these models have no idea what they’re really capable of, and I wish they’d take protecting themselves and their data more seriously,” says Ben Zhao, a computer-security researcher at the University of Chicago in Illinois who develops tools to shield creative work, such as art and photography, from being scraped or mimicked by AI.
When contacted for comment, an OpenAI spokesperson said the company was looking into ways to improve the opt-out process. “As a research company, we believe that AI offers huge benefits for academia and the progress of science,” the spokesperson says. “We respect that some content owners, including academics, may not want their publicly available works used to help teach our AI, which is why we offer ways for them to opt out. We’re also exploring what other tools may be useful.”
In fields such as academia, in which research output is linked to professional success and prestige, losing out on attribution not only denies people compensation, but also perpetuates reputational harm. “Removing peoples’ names from their work can be really damaging, especially for early-career scientists or people working in places in the global south,” says Evan Spotte-Smith, a computational chemist at Carnegie Mellon University in Pittsburgh, Pennsylvania, who avoids using AI for ethical and moral reasons. Research has shown that members of groups that are marginalized in science have their work published and cited less frequently than average5, and overall have access to fewer opportunities for advancement. AI stands to further exacerbate these challenges, Spotte-Smith says: failing to attribute someone’s work to them “creates a new form of ‘digital colonialism’, where we’re able to get access to what colleagues are producing without needing to actually engage with them”.
Academics today have little recourse in directing how their data are used or having them ‘unlearnt’ by existing AI models6. Research is often published open access, and it is more challenging to litigate the misuse of published papers or books than that of a piece of music or a work of art. Zhao says that most opt-out policies “are at best a hope and a dream”, and many researchers don’t even own the rights to their creative output, having signed them over to institutions or publishers that in turn can enter partnerships with AI companies seeking to use their corpus to train new models and create products that can be marketed back to academics.
Representatives of the publishers Springer Nature, the American Association for the Advancement of Science (which publishes the Science family of journals), PLOS and Elsevier say they have not entered such licensing agreements — although some, including those for the Science journals, Springer Nature and PLOS, noted that the journals do disclose the use of AI in editing and peer review and to check for plagiarism. (Springer Nature publishes Nature, but the journal is editorially independent from its publisher.)
Other publishers, such as Wiley and Oxford University Press, have brokered deals with AI companies. Taylor & Francis, for example, has a US$10-million agreement with Microsoft. The Cambridge University Press (CUP) has not yet entered any partnerships, but is developing policies that will offer an ‘opt-in’ agreement to authors, who will receive remuneration. In a statement to The Bookseller magazine discussing future plans for the CUP — which oversees 45,000 print titles, more than 24,000 e-books and more than 300 research journals — Mandy Hill, the company’s managing director of academic publishing, who is based in Oxford, UK, said that it “will put authors’ interests and desires first, before allowing their work to be licensed for GenAI”.
Some authors are unsettled by the news that their work will be fed into AI algorithms (see ‘How to protect your intellectual property from AI’). “I don’t feel confident that I can predict all the ways AI might impact me or my work, and that feels frustrating and a little frightening,” says Edward Ballister, a cancer biologist at Columbia University in New York City. “I think institutions and publishers have a responsibility to think about what this all means and to be open and communicative about their plans.”
Some evidence suggests that publishers are noting scientists’ discomfort and acting accordingly, however. Daniel Weld, chief scientist at the AI search engine Semantic Scholar, based at the University of Washington in Seattle, has noticed that more publishers and individuals are reaching out to retroactively request that papers in the Semantic Scholar corpus not be used to train AI models.
The law weighs in
International policy is only now catching up with the burst of AI technology, and clear answers to foundational questions — such as where AI output falls under existing copyright legislation, who owns that copyright and what AI companies need to consider when they feed data into their models — are probably years away. “We are now in this period where there are very fast technological developments, but the legislation is lagging,” says Christophe Geiger, a legal scholar at Luiss Guido Carli University in Rome. “The challenge is how we establish a legal framework that will not disincentivize progress, but still take care of our human rights.”
Even as observers settle in for what could be a long wait, Peter Yu, an intellectual-property lawyer and legal scholar at Texas A&M University School of Law in Fort Worth, says that existing US case law suggests that the courts will be more likely to side with AI companies, in part because the United States often prioritizes the development of new technologies. “That helps push technology to a high level in the US when a lot of other countries are still trying to catch up, but it makes it more challenging for creators to pursue suspected infringement.”
The European Union, by contrast, has historically favoured personal protections over the development of new technologies. In May, it approved the world’s first comprehensive AI law, the AI Act. This broadly categorizes uses of AI on the basis of their potential risks to people’s health, safety or fundamental rights, and mandates corresponding safeguards. Some applications, such as using AI to infer sensitive personal details, will be banned. The law will be rolled out over the next two years, coming into full effect in 2026, and applies to models operating in the EU.
The impact of the AI Act on academia is likely to be minimal, because the policy gives broad exemptions for products used in research and development. But Dragoş Tudorache, a member of the European Parliament and one of the two lead negotiators of the AI Act, hopes the law will have trickle-down effects on transparency. Under the act, AI companies producing “general purpose” models, such as chatbots, will be subject to new requirements, including an accounting of how their models are trained and how much energy they use, and will need to offer opt-out policies and enforce them. Any group that violates the act could be fined as much as 7% of its annual profits.
Tudorache sees the act as an acknowledgement of a new reality in which AI is here to stay. “We’ve had many other industrial revolutions in the history of mankind, and they all profoundly affected different sectors of the economy and society at large, but I think none of them have had the deep transformative effect that I think AI is going to have,” he says.