
AlphaFold is now capable of predicting homodimeric complexes, including those formed by the transcription elongation factor Eaf, whose N‑terminal region is shown here.Credit: Google DeepMind/EMBL-EBI (CC-BY-4.0)
A database containing the predicted structures of nearly every known protein on Earth has grown even larger and become more useful for understanding how the building blocks of life work together.
For the first time, the AlphaFold protein-structure database will include predictions of complexes of proteins — with the addition of 1.7 million ‘homodimers’ comprising two interacting strands of the same molecule.
The freely available database, maintained by the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI) in Hinxton, UK, currently holds around 200 million predictions of individual protein structures, made using the AlphaFold2 AI tool, developed by London-based firm Google DeepMind.
Since its release in 2021, this repository has become a bedrock in discovery and a first port of call for research projects that try to understand life at the molecular level. But previous iterations of the database lacked predictions of how proteins form complexes, which can be indispensable for their function. For instance, HIV-1 protease — a viral protein that is a key drug target — works only when two copies of the same protein form a working enzyme.
AlphaFold is five years old — these charts show how it revolutionized science
Such proteins were already included in the database as individual ‘monomers’ but their entries tell only part of their story. “We thought, ‘can we bring the AlphaFold database to the next level, where we can include a lot of complex predictions across the tree of life?’” says Martin Steinegger, a computational biologist at Seoul National University in South Korea, who was part of the effort.
Complex interactions
To make predictions for even small complexes of two proteins was a crucial challenge, says Steinegger. “It is quite a different beast than monomer predictions.” Protein-complex predictions are exceedingly intensive computationally, so a consortium — including Steinegger’s lab, EMBL-EBI, Google DeepMind and chipmaker NVIDIA in Santa Clara, California — was formed to take on the challenge.
The consortium focused on protein complexes from 20 of the most studied species, including humans, mice, yeast and bacteria that cause disease in humans, such as Mycobacterium tuberculosis.
The huge protein database that spawned AlphaFold and biology’s AI revolution



