
Credit: adapted from Getty
Soon after US President Donald Trump came to office in January, his administration began cutting funding for scientific research in areas it sees as related to ‘woke’ ideologies. Grants related to transgender health care, diversity, equity and inclusion (DEI) and health disparities in minority groups have been some of the most targeted.
It’s impossible to know what the impact of these cuts will be on the science of the future: the research that might have emerged but that will now not exist. The administration would argue that’s a good thing; tax dollars are no longer being spent on projects it sees as antithetical to the US way of life.
A machine-learning analysis by Nature Index attempts to give a sense of the value of the research that might have been lost, by trying to reproduce the rationale the National Institutes of Health (NIH) used to cancel grants, then applying that method to science that was in the pipeline around ten years ago.
Machine learning reveals potential consequences of cuts to US research
There are, of course, limitations to the insights that such an analysis can provide. A decade ago, science was done in a different political context: it’s impossible to know what grants the Trump administration would have cancelled if it were in power back then. The model used was also trained on a relatively small amount of data and simply considers patterns and associations in grants cancelled this year.
The work suggests, however, that some highly impactful science — from breakthroughs in mapping the human genome to life-saving cancer-screening techniques — might have been at risk of not being funded had a similar process been followed a decade ago.
The exact method that the Trump administration has used to identify which research grants to cancel is a mystery. But, however the decisions have been made, the cuts have been deep, amounting to at least US$4 billion so far from just the NIH and National Science Foundation (NSF), the two largest federal research funders in the United States.
Many of these cancellations have been tracked by Grant Witness, a website set up for this purpose. They have generally focused on research related to topics around DEI, race equity and gender studies, although other tranches of cancellations have targeted institutions — such as Harvard University in Cambridge, Massachusetts. Grants related to other topics but were awarded as part of a racial-justice or equality initiative were also targeted.
For grants cancelled on the basis of their research topic, bibliometrics researchers have suggested that the administration circulated a list of 200 or so words or phrases — including ‘transgender’, ‘DEI’ and ‘disparities’ — that automatically flag grants for review. A similar list has been circulated to federal departments and these terms have been removed from messaging and websites, as reported by The New York Times in March.
More clues might be found in an account given by an ex-employee at the Department of Government Efficiency (DOGE) — the body set up by the Trump administration to cut public spending seen as excessive. The employee told investigative website ProPublica in June that he had used generative artificial intelligence (AI) to assess contracts for cancellation at the US Department of Veterans Affairs.
At scale
The lack of a transparent methodology for identifying grants for review was discussed in a June court hearing, at which a federal judge ordered the government to restore some of the cancelled grants (that ruling has since been superseded by a Supreme Court decision that allows the cuts to go ahead).
“I think what may have been happening is that somebody, not a scientist, was sitting behind their computer and searching for keywords that they didn’t like,” says Scott Delaney, an environmental-health researcher and one of the founders of Grant Witness. Then they were cancelling grants without regard to the research involved or the impact that the grant might have, he adds. Until last week, Delaney worked at Harvard but he has now resigned because of the impact of the cancellations on his research.
Nature Index 2025 Research Leaders
In an attempt to reproduce the methodology for identifying NIH grants to cut, Nature Index’s editorial team worked with data scientists at Nature Research Intelligence, which manages the Nature Index database. The idea was to train a machine-learning algorithm using information on cancelled grants from the NIH’s Reporter database, Digital Science’s Dimensions database and Grant Witness. (Nature Index’s editorial team is editorially independent of Nature Research Intelligence, which is part of Springer Nature.)
The machine-learning model looked at keywords in the titles and abstracts of cancelled grants listed on Grant Witness to get an idea of what sorts of grant were targeted (see Supplementary information for methodology). It also considered the size of the grant and how much of the funding period was left.
After training and evaluation, we deleted the training data and applied the model to all active NIH grants listed in the Dimensions database at the start of the year, assigning each a predicted risk of cancellation on the basis of these features. It predicted with an overall accuracy of 90%: nine out of ten times, it correctly predicted whether a grant was or wasn’t cancelled. Of those grants that we know were cancelled, it correctly predicted cancellation 70% of the time.
According to the model, the phrases and words that were most likely to lead to cancellation included ‘gender-affirming care’, ‘assigned male at birth’, ‘affirming care’, ‘racial justice’, ‘LGBTQ’ and ‘hate speech’.
History in the making
To get an idea of how such cuts can affect science, we applied the model to NIH grants that were active in 2014. We then identified papers that were funded by those grants and ranked them by the number of citations that they had accumulated (see ‘Lost effort’ and Supplementary information).
The algorithm found 1,287 grants — worth $1.9 billion — that it considered likely to be cancelled, of a total of around 48,000. Around 53,000 publications are related to those flagged grants. Although the grants were active in 2014, some had begun several years before that, and many of the associated papers were therefore published before 2014. The oldest grant — to train postdocs in translational neuroscience — started in 1977, and some of the papers date back to 2002.
The results show the damage that cuts in funding can do to research, and the unpredictable nature of the research process. Although the model was trained on grants that were ostensibly cancelled for ideological reasons, papers that ensued from similar grants that were active in 2014 were not all about transgender health or health inequities. Rather, they encompassed a wide range of research fields and topics.
For example, one grant that the model predicted would have been cancelled, partly on the basis of its repeated mentions of ‘diversity’, led to a paper1 that described a software technique to help genetics researchers to identify chimaeras — DNA sequences that can confound analyses of large pools of genetic material. The paper was among the most-cited in the analysis, with 10,400 citations.
“It is quite common in biology for rather ho-hum methods to be more highly cited than important discovery papers. Everyone needs test tubes and pipettes,” according to Robert Edgar, an independent scientist and lead author of the chimaera paper.
Other highly cited studies that might not have existed if their grant had been cancelled include a seminal paper showcasing the results of the Human Microbiome Project2, the result of a $10-million package over five years to increase the understanding of the human microbiome. The work was probably flagged by the algorithm because three of its supporting grants referenced the diversity of genetic populations.
“It’s our whole field! Dammit,” says Ruth Ley, one of the co-authors of the human-microbiome paper and now the director of the department of microbiome science at the University of Tübingen in Germany.
“That particular grant was a big multicentre thing, but it had amazing trickle-down effects,” she says. “The Human Microbiome Project was an enormous consortium — anyone who wanted to work with the data could.”
This meant, Ley says, that they needed to build a standardization process for how the data were shared and stored, what analysis techniques were used, and how the papers were written and published — all funded by that grant. “What came out of it for the whole field was how we should be doing this stuff,” she says.
“It wouldn’t have happened without that grant. No way. No way.”
Wider impact
Another grant flagged by the model was one that, in 2009, funded the long-term running of the Clinical and Translational Science Institute, part of the University of California, Los Angeles (UCLA), to the tune of $57 million. The institute aims to train clinical scientists to run studies and ultimately to bring research into health care. The model found keywords in the grant, such as ‘expression’, that increased the chances of cancellation, as did the size of the grant.
The institute went on to produce more than 3,000 publications. Among the most highly cited of them, with 9,674 citations, was a 2011 study3 that found that lung-cancer screening using low-dose computed tomography (CT) scans was more effective than use of conventional radiography, and reduced death rates by 20%.