Rafaella Rogatto De Faria was nearing the end of her PhD when her adviser proposed a fresh project. The idea was to analyse genetic, imaging and surgical-outcome data, to find biomarkers that could help to identify which people with osteoarthritis would respond best to knee-replacement surgery. De Faria, an athlete and a biomedical engineer at the University of São Paulo, Brazil, knew the profound impact of cartilage and joint injuries on people’s lives, and stayed on to pursue the project after she defended her PhD in July 2024.
She and her colleagues began by gathering data from people being treated for osteoarthritis at the university. The team has a cohort of 200 individuals so far, with data gathered over the past two years. “We are actually creating our own biobank,” De Faria says. “We don’t have this yet here.”
But the team also wanted to validate its results against a larger data set. Colleagues suggested looking to the UK Biobank, a collection of images, clinical and genetic data and physical samples from 500,000 individuals in the United Kingdom, some of whom have been studied since 2006. Within a few months of applying for access, De Faria had at her fingertips data from 40,000 people with osteoarthritis, half of whom had undergone total-knee-replacement surgeries. That’s a 200-fold increase over her original cohort. “The data were exactly what we were needing,” De Faria says.
My moonshot to preserve endangered species
De Faria’s data needs are not unique. Around the world, researchers studying human health often find themselves in need of more, and more-diverse, samples. They could try to collect them themselves or track down existing samples by contacting researchers in the same field of study. Alternatively, they could reach out to entities that have been specifically created to share these resources: biobanks.
Biobanks range in size from the scale of a single laboratory, such as the collection that De Faria and her adviser started, to that of the UK Biobank, one of the world’s largest such collections. Most contain both data and physical samples that researchers can request for their studies. And although some smaller banks aim to serve researchers at a single institution, large-scale initiatives such as the UK Biobank, the Mexico City Prospective Study and the All of Us initiative by the US National Institutes of Health (NIH) are designed to meet the needs of the global research community.
For many researchers, biobanks can mean the difference between a successful project and one that stalls for want of crucial data. Yet most biobanks are underutilized, with some surveys suggesting that less than 10% of banked samples get used, says health-policy researcher Amanda Rush at the University of Sydney in Australia.
In planning a biobank-enhanced project, researchers must weigh the pros and cons of creating their own collection of samples and data against the support that larger, existing biobanks can offer. They must also factor in practical considerations such as cost, data security and the legalities of shipping biological materials, Rush says.
And then there are the more strategic considerations. Some projects might be best served by having a large number of samples, but others might benefit from a bespoke collection that offers richer metadata for each specimen, Rush says. There are different scenarios for which each of these biobanks “comes to the fore”, she says.
Roots of biobanks
Biobanking as an enterprise stems from technological advances that began in the late 1980s, says Peter Watson, who leads biobanking services at BC Cancer Research, a research institute in Vancouver, Canada. Speedier DNA-sequencing technologies, faster computing and larger, more powerful databases meant that biological data could be collected and reused endlessly. But efforts to create repositories of data and samples were mostly siloed and ad hoc. “It was just sort of individual efforts in different institutions,” he says.
These genes can have the opposite effects depending on which parent they came from
As a graduate student studying rare paediatric tumours in the early 1990s, Jennifer Byrne and her adviser relied heavily on tissues that had been surgically removed. The specimens were often large — on one occasion, Byrne remembers rushing to the hospital to receive a grapefruit-sized sample — and patients donated them in the hope that others with the same disease would benefit. “There were no cell lines for those cancer types, so we had to study human material,” says Byrne, who is now a molecular oncologist at the University of Sydney. The result was effectively a biobank, “but we didn’t even really realize that we were doing that”.
Although samples gathered in this way are not freely available, because of issues around consent, researchers who can demonstrate funding and ethical approval can approach the custodians for collaborations, Byrne says.
This pathway for external investigators to tap into a collection is what separates a biobank from a stash in a lab freezer, Byrne says. “Biobanks are designed to be reused for different purposes, by different people.”

A UK Biobank researcher handles frozen samples.Credit: David Guttridge/UK Biobank
Understanding the access policies early on is crucial to success, she adds. “Do they provide samples to anybody, or are they largely set up to serve the needs of a single network of researchers?” Finding smaller biobanks, with more-restricted access, can be tricky because they are generally not well advertised. In 2016, Byrne and her colleagues created a biobank registry for the Australian state of New South Wales, allowing researchers to find information about resources in their area and register their own biobanks (https://nsw.biobanking.org).
The Royal College of Surgeons in Ireland (RCSI) has taken a similar approach, curating several disease-specific collections created by individual RCSI researchers into an institutional biobanking service at its headquarters in Dublin. The biobank streamlines the process of donor consent for samples to be used in research. Researchers who wish to contribute samples can ask to have their own collections added, and would-be collaborators can apply to use the materials or data. But the clinicians who gathered the samples remain closely involved in their use.
Indeed, researchers wanting to access the materials or data should plan on collaborations rather than treating the biobank purely as a vault to extract information, says RCSI geneticist Gianpiero Cavalleri. “We want it to be used as much as possible,” Cavalleri says. “But the typical access model is in collaboration with the investigator.”
Scaling up
Despite the intent to share, tapping into a restricted-access biobank can be challenging for many researchers, Rush says. Shipping can be up to US$50 per sample, meaning it could cost thousands of dollars to acquire enough material for a large study. Furthermore, the complex legal and other agreements required to transport biological samples or share data securely can stymie early-career researchers or those without large pots of funding. Turning to larger biobanks can help to surmount these barriers, because they might have systems in place to help with the logistics.
Ambitious survey of human diversity yields millions of undiscovered genetic variants
Another consideration, says clinical researcher Alex Chaitoff at the University of Michigan in Ann Arbor, is the breadth of samples available. For his work, Chaitoff often uses large databases, such as the NIH All of Us biobank and the US Centers for Disease Control and Prevention’s National Health and Nutrition Examination Survey (NHANES). This is the only national US health database that includes health and nutrition information for people of all ages, with roughly 5,000 participants added each year. These data sources “are much more likely to be nationally representative”, Chaitoff says. All of Us is especially valuable, he adds, because it includes groups that have been historically excluded from scientific research, such as Native American communities (The All of Us Research Program Genomics Investigators. Nature 627, 340–346; 2024). “They oversample populations that are generally undersampled in research,” he says.
Gaining access
Once researchers find a biobank with the data that they need, they must navigate issues around access — and payment.
Duniel Delgado Castillo, a biomedical engineer at the National Autonomous University of Mexico in Mexico City, was combing through the research literature on physical changes in the brains of people with long COVID when he stumbled across a trove of brain images that he desperately needed. Castillo had already tried, with little success, to reach out to the authors of various studies to access their image collections. By comparison, the resource he discovered, part of the UK Biobank, was much larger, easier to access and seemed to have exactly what he was looking for.
Estonians gave their DNA to science — now they’re learning their genetic secrets
He applied for access early in 2024. Although the initial application suggested there would be a fee in the £3,000–£9,000 (US$4,000–$12,000) range for three years of access, the biobank offered him a grant to cover the costs. Within months, he was working with magnetic resonance imaging (MRI) scans from 1,000 participants, half of whom had long COVID; the other half were matched controls. “If I didn’t have those images, it would be impossible to continue with my investigation,” Castillo says.
Data from the UK Biobank can be used only within its own secure cloud-computing service, the Research Analysis Platform. It includes tools to analyse genomic and translational data, perform statistical analyses on images and other data types, as well as providing machine-learning tools that can be accessed using JupyterLab, an open-source data-science system.
Training to use the biobank’s secure computing space was easy, Castillo and De Faria say. And trying to recreate it on their institutional systems would have been cumbersome and expensive, Castillo adds. “It was a relief for me because all the security of the data is built into the Research Analysis Platform,” he says. “I don’t have to worry about it.”
World’s biggest set of human genome sequences opens to scientists