In April, de-identified biomedical data for 500,000 UK Biobank participants appeared for sale on an e-commerce platform owned by Alibaba, a global technology company based in Hangzhou, China. As soon as the listings were discovered, the UK Biobank and Alibaba worked together with the UK and Chinese governments to remove them before any sales occurred. The UK Biobank temporarily suspended access to its research platform, tightened up its monitoring of the data being exported from the platform and imposed bans on the academic institutions to which the data had originally been released.
UK Biobank breach prompts the field of genomics to rethink open science
A separate data breach occurred in the United States. Over the past few years, a group of researchers has bypassed restrictions to obtain de-identified data from more than 20,000 children taking part in the Adolescent Brain Cognitive Development Study — a project funded by the US National Institutes of Health (NIH). The researchers used the data to promote white supremacist views. The NIH then strengthened access requirements, added mandatory training on responsible data use and implemented compliance checks on scientists seeking to use the data.
Such breaches affect the entire research community. They could make people wary of joining studies. Meanwhile, institutions might tighten up access to their databases and reduce their reliance on international data sets.
Could Africa be the future for genomics research?
Such responses are understandable — but they will hamper science. Genomics research requires diversity, integration and interoperability. To progress, the field must prioritize secure sharing, standardization across platforms and meaningful integration of data sets on a global scale.
Human genomics research has exploded in the past two decades. Increasingly, studies rely on large-scale, longitudinal cohorts that integrate genomic data with detailed health records and include hundreds of thousands of participants across decades. The field is also shifting beyond single reference genomes derived from a limited number of individuals and towards population-scale models, exemplified by the Human Pangenome Reference Consortium and the Chinese Pangenome Consortium. As data sets grow in scale and complexity, it becomes more important to manage them well.
Genomic data are conventionally shared globally under agreed rules. To access data from huge resources such as the Cancer Genome Atlas, researchers must share study proposals and ethics approvals, and agree to strict data-use conditions. They must then analyse the data within defined governance frameworks. This model has enabled scientists worldwide to test ideas, validate findings and contribute to a shared scientific enterprise. And when data are accessible, results can be scrutinized, making errors and misconduct easier to detect.
Personalized CRISPR therapies could soon reach thousands — here’s how



