You have full access to this article via your institution.

Studies of published papers in the social sciences suggest what factors could make research findings more likely to endure.Credit: Roni Bintang/Getty
The durability of research findings can be cast in terms of three Rs. Findings should be reproducible (the same type of analysis using the same data should produce the same result); replicable (redoing an experiment to collect fresh data should produce the same result); and robust (alternative analyses using the same data should draw the same conclusion).
Over the past two decades, studies in fields from psychology to medicine have highlighted that these criteria are often not met, leading to talk of a crisis in replication and reproducibility. Four papers1–4 published this week in Nature look at the reproducibility, replicability and robustness of research in the social and behavioural sciences. They provide a snapshot of the analysed fields, and suggest factors that could make research findings more likely to endure. Researchers, funders, journals and institutions should take note — for the betterment of all science.
Building trust in scientific evidence
Three of the papers1–3 are an outcome of nearly US$8 million in funding provided in 2019 by the US Defense Advanced Research Projects Agency to the Systematizing Confidence in Open Research and Evidence (SCORE) programme. The project is run by the Center for Open Science, a non-profit organization in Washington DC. More than 850 researchers contributed to hundreds of duplication efforts, establishing a database of reliability markers for 3,900 papers published between 2009 and 2018 (see go.nature.com/4campyc). The fourth paper4 is the result of a series of one-day ‘replication games’ workshops organized around the world since 2022 by the Institute for Replication, a virtual, non-profit network.
Some of the results are sobering. For example, Tyner et al.1 find that statistically significant effects could be replicated for only about half of the 164 papers they studied. Moreover, the replicated effect sizes were on average less than half of what was originally reported. This ‘decline effect’ has been reported before5, but it is unclear how much is due to authors’ cognitive biases, questionable research practices, the preference of journals for eye-catching results, flukes or true effects that are specific to a particular population and time.
Huge meta-research project puts claims in social-science papers to the test
This is a reminder to treat research results with a degree of scepticism, particularly if they are surprising. That applies to their robustness, too. Aczel et al.2 found that only 74% of statistically significant conclusions from a sample of 100 papers were found to still be significant when the same data were analysed in an alternative way. Brodeur et al.4 reached a comparable conclusion.
Some of the work analysed in these papers was done before concerns over research reliability became widespread and terms such as ‘P hacking’ (to describe the tweaking of analyses until they yield significant results) became commonly used. Awareness of transparency has only grown since then, as have mechanisms and norms for scientists to implement practices such as data sharing.
The newly published papers provide strong support for such norms. Miske et al.3 looked at the reproducibility of a random sample of 600 articles in the social and behavioural sciences. They found that only 20% included full details of data and code, although in some cases the researchers could reconstruct data sets from other public sources and analysis steps using written descriptions. When both original data and code were shared, the reproducibility rate was 91%, but this dropped to just 38% in cases in which reanalysis required reconstructing both data sets and analysis steps. Brodeur et al.4 similarly conclude that 85% of the claims of papers published in economics and political-science journals that mandate the sharing of code and data could be reproduced. They find that in 2014, only 59% of papers in these journals included files of data and code to aid replication. Between 2021 and 2023, that rate was stable at nearly 90%.
Why science has a credibility problem — and how to address it
The power of journal policies and the research field’s norms to produce more-reliable research outcomes raises possibilities for other strategies to explore. For instance, should scientists be organizing self-replication studies? Should journals hire data editors, if they have not done so already? How valuable is it to bring in further analyses (and other types of expertise) for robustness checks?
And there is much more work needed before we can fully understand what approaches produce research that stands up to repeated scrutiny. For instance, what practices have the biggest potential pay-offs; what conditions predict and aid replicability and robustness; when are duplication studies worth the effort; what kind of reliability assessment is most informative; how can broad measurements of reproducibility, replicability and robustness be refined so they are valid across nuanced, varied studies? It will also be intriguing to explore how these insights can apply to other fields, such as biomedical research and computer science.
As Brian Nosek, one of the leaders of the SCORE collaboration, points out in an interview6 with Nature, “100% replicability might imply that the work is extremely conservative and does not push the boundaries of knowledge into the unknown”. Nevertheless, current rates of replicability and reproducibility leave room for improvement. Any insights that can make the path to reliable findings more reliable will accelerate progress. Looking back at previous work is as necessary as looking ahead. Rigorous practices to do so demonstrate the scientific method at work.




