Tuesday, October 7, 2025
No menu items!
HomeNaturerecord, share and value it

record, share and value it

From short scripts to vast simulations of Earth’s climate, protein structures or even the cosmos, it is hard to imagine scientific research without software. Scientists use software code in myriad ways — to plan experiments; to record, organize, analyse, visualize and archive data; to control scientific instruments, and more.

But software evolves. Most open-source software used in research is refined both iteratively and collectively, and has no published ‘version of record’. Updates can target various versions and releases, meaning that each aspect of the software — the project as a whole, a specific version or a single file — can require a different way to refer to it. This creates confusion.

And so software comes with a double bind: like data, it supports the findings of a study and should be preserved and published. Yet it should also remain available and supported, and possibly be improved, over time. Scholars, librarians, research institutions and funding agencies are wrestling with how to reconcile these two requirements.

Recent efforts to do this1 have focused on adapting a set of principles initially developed2 to make research data findable, accessible, interoperable and reusable (FAIR). But this approach relies on tracking data, archiving them and making metadata available. For software, it would create a large administrative burden that would have to be sustained for decades. That would be disproportionately time-consuming to those who maintain and improve software — and whose efforts are already underappreciated. Imagine, for example, maintaining a software package that has tens of contributors (which is not rare). Each release and version requires a new upload to an archive, with updates to the metadata, author list, dependencies (any other software required for programs to run), interoperability (which other programs it can work with) and more. Some programs have a weekly or even daily release cycle, making the FAIR approach impractical.

Researchers must be able to publish a piece of software without the need for lengthy bureaucratic procedures to identify rights holders, choose an open licence and protect intellectual property.

As researchers and engineers with expertise in software development in various scientific domains — ranging from computer science to neuroscience, physics and chemistry — we have recently proposed an approach called ‘CODE beyond FAIR’ that outlines how software can be better handled, shared and maintained. Here, we outline recommendations for two groups: the scholars who develop software (see Supplementary information (SI), Table S1), and the research institutions, funders, libraries and publishers that use it (see SI, Table S2). Our recommendations are based on our collaborative experience in developing open-source software, but also draw from the work of free and open-source software (FOSS) communities. These have long experience of project governance, funding, recognizing individual contributions and training future contributors.

Train scientists to share code

Sharing the code that has been used to reach a paper’s conclusions is important for research integrity and reproducibility, but practices vary widely among research communities. Permissive licences, which let others use and modify software with few restrictions, are increasingly common, particularly in computer science, mathematics and physics, yet most software is still not published at all.

Platforms exist to share code, such as GitHub or GitLab, and to archive it, such as in the repositories Zenodo or Software Heritage, which can capture the whole history of a project3.

All researchers should know how to share and deposit code. That needn’t mean that all scientists must spend time and resources keeping abreast of this fast-paced field. Finding the right level of expertise to ensure that researchers know how to document, share and archive code in their field is crucial.

One way to improve matters is to train all PhD students from all scientific disciplines in the basics of software engineering during the first year or so of their postgraduate research careers. Institutions must embed this in all scientific curricula. Several universities, including Stanford and Harvard in the United States and Oxford and Cambridge in the United Kingdom, already offer (and in some cases, require) at least one introductory programming or computational-thinking course — even for degrees that are not scientific or technical.

International training organizations exist that teach data and computational skills to wide audiences with basic or no knowledge of software development. For example, the Neuromatch Academy for global neuroscience education — co-founded by computational neuroscientists Dan Goodman and Konrad Körding during the COVID-19 pandemic4 — reported having supported more than 2,000 students from more than 100 countries through online training in 2024. And The Carpentries, founded in 1998 by Greg Wilson to improve the computational skills of researchers5, has facilitated or organized almost 4,800 workshops in more than 70 countries so far. These courses range from basic computational skills (such as in the programming languages Shell, R and Python, and version control) to advanced concepts (statistics and machine learning) as well as discipline-specific advanced skills.

Boost archiving processes

To increase the uptake of good practices, publishers should mandate the sharing and archiving of code at the time of publication. It is as simple as clicking a button on the Software Heritage or GitHub platforms. GitHub, now owned by Microsoft, has become the de facto international hub for code sharing.

Institutions should support efforts to connect portals to ensure adequate cross-referencing between different projects and versions, such as those in the European Open Science Cloud research-support platform.

RELATED ARTICLES

Most Popular

Recent Comments