Each year, the code-sharing platform GitHub releases its ‘State of the Octoverse’ report, which among other things ranks the popularity of programming languages. The latest report, released in October 2024, had some good news for pythonistas, as Python programmers are called: for the first time in ten years, the language JavaScript had been bumped from the top of the leader board and replaced by Python.
“This is the first large-scale change we’ve seen in the top two languages since 2019 — and it speaks to the rise in Python that’s accompanied the generative [artificial intelligence] boom we’ve seen over the past two years,” the report says.
No installation required: how WebAssembly is changing scientific computing
For researchers who have watched the growing fusion of science and coding, that news perhaps answers a basic but rarely asked question: with so many programming languages to choose from, which one should I learn?
But the choice is not that simple.
“For a very long time in computer science, a lot of people who work in programming languages have had the ostensible goal of [creating] the ‘one language to rule them all’,” says Rob Patro, a computational biologist at the University of Maryland in College Park. But that’s a bit like a carpenter who, armed only with a hammer, treats everything as a nail: different situations might call for different tools, and there is no single ‘best’ language.
Nature asked computer scientists and bioinformaticians what advice they would give to researchers who recognize the need to pick up some coding skills but don’t know where to start. Here are four key questions to help you decide.
What do you mean by ‘programming’?
Some researchers build tools, others use them. Both are ‘programmers’, but the style of programming and the skills required are different.
“Somebody has to make the lathe; somebody has to make the electron microscope,” says Greg Wilson, a software engineering manager at Plotly, a company in Montreal, Canada, that develops interactive graphing tools. “But most scientists don’t need to know how to do that — they need to know how to use those tools, not how to make them from scratch.”
Reactive, reproducible, collaborative: computational notebooks evolve
The computational ‘lathe’ in this analogy is software designed to solve a given problem accurately and efficiently — say, aligning DNA-sequencing readouts to a reference genome. The code to do that is often mathematics-heavy and memory-intensive; it can require multiple processors working in tandem; and it is often written in languages such as C/C++, Rust and Fortran. These are ‘compiled’ languages — they require a compilation step to translate human-written code into instructions the computer can execute, and demand a deeper understanding of how computers work, but they produce fast, highly optimized software.
Most scientists, however, are data wranglers who want, for instance, to quantify gene expression by aligning RNA sequences to a reference genome rather than building a tool to do the alignment. This data workflow is typically accomplished using ‘scripting’ languages such as Python, R or Matlab, often in concert with computational notebooks such as Jupyter, Quarto or marimo (see ‘A notebook for reproducible Python code’). Such languages do not need to be compiled and are interpreted by computers directly, line by line. That makes this workflow interactive and easy to learn — type a command, get a result, repeat — but relatively slow, because the computer has no opportunity to optimize what it’s doing.
Web interfaces that help to make those tools broadly available to users are often written in JavaScript, and the databases underlying those interfaces might use a different language, such as SQL. And then there are tools that tie these pieces together — another form of programming. You can do a lot of data manipulation at the text-based command line, for instance. Workflow languages such as Snakemake and Nextflow make it easy to string tools together into sophisticated computational pipelines.
What are your colleagues using?
For many programming tasks, almost any language will do. But for beginners, it’s good to choose one that a colleague can help with. Furthermore, if everyone in your field is using a particular language, it helps to be using the same one, too.
Edoardo Saccenti, who specializes in systems-data analysis and applied statistics at Wageningen University & Research in the Netherlands, has a good command of multiple programming languages. For transcriptome analysis, he uses R. “All of the most-used packages and tools have been developed in R,” he explains. But Saccenti also studies psychometrics, a branch of psychology that evaluates how psychological traits are measured and quantified. In that case, he turns mostly to Matlab. “I’ve never seen a psychometry paper written in Python,” he says.
Which tools are available?
Coders can extend the core capabilities of programming languages using ‘libraries’ — collections of software routines that provide further functions. Many popular machine-learning libraries were developed in Python; the Bioconductor collection of bioinformatics tools works in R; and alevin-fry, a tool for processing single-cell RNA-sequencing data, was written in Rust.