Computational design of metallohydrolases | Nature

December 4, 2025

65

Metallohydrolases catalyse some of the most difficult hydrolysis reactions in biology by using their bound metal ions to activate a water molecule positioned adjacent to the substrate bond to be cleaved^16,17,18. Engineering new metallohydrolases is currently of considerable interest for degrading human-generated environmental pollutants, for which there has not been sufficient time for efficient hydrolytic enzymes to evolve^19,20,21. Protein engineering has expanded the scope of substrates that can be hydrolysed by metallohydrolases, but this often requires initial promiscuous activity^22,23. De novo enzyme design has been used to generate new metallohydrolases^6,10,24, but these have had relatively low activity and efficiency, and have required extensive directed evolution to match the activity and efficiency of native enzymes²⁴. Given an ideal metallohydrolase active site, de novo enzyme design seeks to identify or generate a protein scaffold that positions the catalytic residues, metals, and substrates in optimal catalytic geometries with high accuracy^25,26. RFdiffusion has been used successfully to scaffold active sites, but the search has been limited by the need to specify the sequence positions and conformations of the catalytic residues^8,9,27.

We reasoned that a generative artificial intelligence design method that only required the specification of side-chain functional group positions around a reaction transition state, and was capable of sampling over all possible sequence positions and conformations of these residues, could more readily satisfy complex catalytic constraints^14,15,28,29. We set out to develop such an approach, and used it to design new metallohydrolases starting from a quantum chemistry-generated active site description with a bound metal cofactor.

To enable sequence-position and side-chain rotamer-agnostic enzyme design, we developed a generative artificial intelligence flow-matching model called RFdiffusion2³⁰. RFdiffusion2 extends the capabilities of RFdiffusion to generate scaffolds that position a set of functional residues (a ‘motif’) in two key ways. First, it enables atomic substructure scaffolding: RFdiffusion can only scaffold backbone-level motifs (with the side-chain and backbone atom N-Cα-C=O positions specified), whereas RFdiffusion2 can scaffold arbitrary atom-level motifs (any subset of amino acid heavy atoms). This is important for enzyme design because it allows users to specify only the positions of the key functional groups that interact with the reaction transition state, rather than the full side-chain and backbone conformation. Second, RFdiffusion2 enables sequence-position-agnostic scaffolding: RFdiffusion requires specification of the primary sequence positions of the motif residues, but RFdiffusion2 can scaffold motifs whose primary sequence positions are unknown. RFdiffusion2 replaces diffusion with flow matching^31,32 and achieves sequence-position-agnostic atomic substructure scaffolding by providing randomly selected native atomic coordinates (but not their sequence positions) during training in addition to the partially noised, sequence-labelled atomic coordinates. With these improvements, RFdiffusion2 generates diverse proteins starting directly from catalytic configurations that consist of input functional group positions and substrate coordinates. Allowing the model to resolve the a priori unknown degrees of freedom (that is, the primary sequence positions and side-chain rotamer conformations of the catalytic residues) is considerably more effective at generating self-consistent design solutions than randomly sampling those degrees of freedom before inference, because the space is far too large to enumerate, as was necessitated with RFdiffusion. A detailed description of RFdiffusion2 training and benchmarking results for a wide range of active site scaffolding problems is described elsewhere³⁰.

As an initial test of RFdiffusion2, we chose to design a zinc metallohydrolase for the hydrolysis of a fluorogenic ester, 4-methylumbelliferyl phenylacetate (4MU-PA), as a target reaction (Fig. 1a). We began by using density functional theory (DFT) to identify the transition-state geometry of the rate-determining Zn(II)-OH nucleophilic attack on the substrate ester. Four distinct catalytic arrangements based on the stereochemistry of the tetrahedral intermediate and the nature of the oxyanion hole were considered (Fig. 1b, Supplementary Figs. 1 and 2, Supplementary Data 1 and Supplementary Methods 4.1). These calculations provide the coordinates of the three Zn(II)-binding imidazole rings, the metal, and the transition state. Our previous RFdiffusion approach required the backbone coordinates and residue positions as inputs, which would require upfront sampling of the rotameric states and sequence position for each histidine. This cannot be done exhaustively: even with relatively coarse sampling around the side-chain chi angles χ₁, χ₂, and the backbone torsion angle ψ, there are on the order of 10¹⁸ possible choices for the side-chain conformations and sequence placements of our full catalytic site (Fig. 1c and Extended Data Fig. 1). Whereas each RFdiffusion run has to be initialized with a specific (and generally randomly selected) choice from this enormous set of combinations, RFdiffusion2 as described above searches the entire space in each trajectory.

RFdiffusion2 inference trajectories were used to build protein scaffolds housing the DFT-generated minimal active site configurations, referred to as theozymes^2,33. Several snapshots from a representative trajectory are shown in Fig. 1d, transforming random noise on the left into the final backbone on the right (Supplementary Video 1). The Cα atoms of each residue (shown as coloured spheres representing final sequence position) are initially sampled from a Gaussian distribution, and the target functional atom positions (shown in sticks) stay fixed. As the trajectory proceeds from left to right, the global structure takes shape around the motif, with the fixed histidine side chains eventually connecting to Cα atoms of the protein backbone at sequence positions of the network’s choosing. A total of 5,120 RFdiffusion2 inference trajectories were carried out starting from different random seeds and for each of the resulting protein scaffolds, sequences were generated using ProteinMPNN³⁴. The catalytic geometry and interactions with the transition state of those designs for which the AlphaFold2³⁵ predicted structure was close to the design model were further optimized using iterative LigandMPNN³⁶ and constrained Rosetta repacking and minimization³⁷ (Extended Data Fig. 2 and Supplementary Methods 4.1). Designs containing a proposed general base positioned to activate the water molecule (that is, Glu, Asp or His within hydrogen bonding distance of the Zn(II)-bound water) and side-chain hydrogen bonds stabilizing the transition-state oxyanion (if applicable), and that AlphaFold2 predicted to adopt the target structure, were characterized with PLACER¹² to assess active site preorganization. A total of 96 designs were selected for experimental characterization on the basis of predicted active site geometry and preorganization (Supplementary Fig. 3, Supplementary Data 2 and 3 and Supplementary Methods 4.1).

Linear DNA fragments encoding the 96 designs were cloned into a plasmid encoding a C-terminal Strep-tag and used to transform Escherichia coli, and the resulting proteins were purified using Strep-tag affinity chromatography. Eighty-six out of ninety-six designs were expressed and soluble as judged by SDS–PAGE analysis of the eluants (Supplementary Fig. 4). Purified designs were supplemented with zinc sulfate, and hydrolysis of 4MU-PA was monitored by fluorescence. Five designs (A1, A8, B9, C4 and F7) had activity well above background (Fig. 2b and Supplementary Fig. 5). Sequence-verified single clones for each of these were expressed and purified by affinity chromatography followed by size-exclusion chromatography to obtain pure, monomeric protein fractions (Supplementary Figs. 6 and 7 and Supplementary Table 1). Michaelis–Menten kinetic characterization of the purified variants revealed a k_cat/K_M of 16,000 ± 2,000 M⁻¹ s⁻¹ for A1, the most active design, and k_cat/K_M values in the range of 35–140 M⁻¹ s⁻¹ for the other four designs (Fig. 2c,d, Extended Data Fig. 3 and Extended Data Table 1). For comparison, the k_cat/K_M of previously designed metallohydrolases²⁴ ranged from 3 to 60 M⁻¹ s⁻¹ (Supplementary Table 2). A1 is also a relatively robust enzyme, and retains activity for at least 1,000 turnovers (Fig. 3e and Supplementary Fig. 8). A1 differs considerably from previously described proteins: the most similar structures identified through template modelling (TM) alignment with the Protein Data Bank (PDB) and AlphaFold Protein Structure Database (AFDB) have TM scores³⁸ of 0.41 and 0.49, respectively, and do not have analogous arrangements of catalytic residues (Extended Data Fig. 4a,b). We refer to A1 as zinc metalloesterase 1 (ZETA_1) throughout the remainder of the text.

**Fig. 2: Activity characterization and PLACER preorganization assessment.**

**Fig. 3: Characterization of ZETA_1 activity.**

Design ZETA_1 not only has remarkably high activity but was also the top-ranked design in our in silico ranking. The structure in the absence of substrate was predicted to be very close to the design model by AlphaFold2 (Extended Data Fig. 5a and Supplementary Figs. 9 and 10), and the designed active site of ZETA_1 was predicted to be highly preorganized by PLACER, with the catalytic side chains fixed in place and the substrate held closely in its designed position, adjacent to the proposed Zn(II) site. PLACER¹² is a deep neural network that, given a protein backbone containing a substrate, fully randomizes the positions of the substrate and all side chains within a 600-atom sphere, and then generates new coordinates for these groups¹²; repeated PLACER trajectories generate an ensemble of possible side-chain conformations and small molecule docks. Design ZETA_1 stood out from the other designs in both the extent of catalytic site preorganization (the catalytic side chains were largely fixed in space in catalytically competent conformations) and the positioning of the substrate–transition state in the active site (in the ZETA_1 ensemble, the substrate remained largely fixed in space in the active site, whereas in the inactive designs H7 and H8, it fluctuated considerably) (Fig. 2e–h and Supplementary Videos 2–5). Seven designs based on the same ZETA_1 backbone family were initially filtered out during the design selection phase, as they had suboptimal PLACER metrics; we retrospectively expressed and purified these designs and found that they had very low or no activity, further highlighting the utility of PLACER ensemble calculations for identifying active designs (Supplementary Fig. 11). These findings suggest that combining global structure prediction with detailed PLACER modelling of the active site provides a powerful approach to assessing the catalytic machinery and substrate binding geometry for design selection (Supplementary Fig. 10).

The ZETA_1 active site consists of a primarily hydrophobic pocket with three histidines binding Zn(II) with their Nε atoms, an aspartate as a potential general base, and an asparagine that forms a hydrogen bond to the coumarin ring (Fig. 3a). As in the original theozyme model used to generate ZETA_1, the Zn(II) ion also acts as an oxyanion hole, stabilizing the developing negative charge at the transition state; there are no nearby side-chain hydrogen bond donors (Extended Data Fig. 5). Zinc is absolutely critical for ZETA_1 activity: extraction of bound Zn(II) by dialysis in the presence of the chelator 1,10-phenanthroline completely eliminated activity, and activity was subsequently restored by addition of zinc to the solution (Fig. 3f). Zinc titration experiments measured a dissociation constant (K_D) for Zn(II) of 41 ± 5 nM, which is similar to those of previous designed zinc enzymes^26,39, but higher than native zinc hydrolases^18,40,41,42, which typically have K_D values less than 10 nM.

We carried out mutagenesis experiments to probe the contributions of the designed catalytic residues to Zn(II)-binding and catalysis (Fig. 3g–i and Supplementary Figs. 12–14). In the design model, N17 positions the substrate by hydrogen bonding with the lactone carbonyl of the coumarin moiety and could stabilize the developing negative charge on the leaving group; the N17A mutation led to a 8.1-fold decrease in k_cat/K_M (Supplementary Fig. 13). Mutation of all three metal-coordinating histidine residues to alanine simultaneously (H118A/H130A/H134A), as well as two of the three single histidine-to-alanine substitutions (H118A/H134A), completely inactivated the enzyme, as expected. Mutating the third Zn(II)-coordinating residue to alanine (H130A) resulted in a decrease of only 13-fold in k_cat/K_M, and mutation of the proposed general base D67 to alanine had little effect on k_cat/K_M and increased Zn(II)-binding affinity. These results suggest that H134/H118/H130 and H134/H118/D67 may be competing Zn(II)-binding sites owing to the close proximity of the coordinating side chains of H130 and D67, which was corroborated by Chai-1 (ref. ¹³) predictions of the protein–Zn(II)–substrate complex (Extended Data Fig. 5b,c); the D67A mutation may confine the zinc to the originally designed coordination sphere with the three histidines, which is more catalytically competent. In the H130A mutant, D67 is likely to coordinate Zn(II) and maintain binding, albeit in a less optimal binding geometry, lowering the zinc affinity and enzyme activity.

Guided by these observations, we started from new DFT theozymes explicitly containing the catalytic base, and generated protein structures scaffolding these theozymes using a newer version of RFdiffusion2 trained from random weight initialization on a threefold-larger dataset (previous versions were fine-tuned from structure prediction weights) (Fig. 4a, Supplementary Data 1 and Supplementary Methods 4.2). Designs whose Chai-1 predictions of the protein–Zn(II)–substrate phosphonate ester complex, mimicking the reaction transition state, closely matched the design models with high confidence were identified by PLACER to have highly preorganized active sites (Supplementary Figs. 15 and 16). Ninety-six such designs spanning 37 RFdiffusion2-generated backbones were selected for experimental characterization (Supplementary Fig. 17 and Supplementary Data 2 and 3). Eighty-five of the 96 designs were expressed and soluble (Supplementary Fig. 18), and 11 designs spanning 3 different folds had substantial zinc-dependent 4MU-PA hydrolysis activity (Fig. 4b,c and Supplementary Fig. 19). Michaelis–Menten analysis revealed that 5 designs had a k_cat/K_M greater than 10⁴M⁻¹ s⁻¹ and 6 designs had a k_cat/K_M greater than 10³M⁻¹ s⁻¹ (Fig. 4d, Extended Data Fig. 6 and Extended Data Table 1). The most active designs for each backbone had a k_cat/K_M = 53,000 ± 5,000 M⁻¹ s⁻¹ (ZETA_2), k_cat/K_M = 19,000 ± 2,000 M⁻¹ s⁻¹ (ZETA_3), and k_cat/K_M = 1,100 ± 200 M⁻¹ s⁻¹ (ZETA_4) (Fig. 4f–h and Supplementary Fig. 20). ZETA_2 has a k_cat = 1.5 ± 0.1 s⁻¹, a threefold increase over the k_cat of ZETA_1, and close to that of the metallohydrolase MID1sc10 obtained after 10 rounds of directed evolution²⁴. RFdiffusion2 enables specification of the position of the substrate relative to the centre of mass of the designed protein; for ZETA_2 and ZETA_3, the protein was centred near the phenylacetate and 4-methylumbelliferyl moieties, respectively, of 4MU-PA, resulting in opposite substrate binding modes in the design models (that is, the 4-methylumbelliferyl is exposed in ZETA_2 and the phenylacetate is exposed in ZETA_3) (Extended Data Fig. 7).

**Fig. 4: Characterization of second round designs.**

The success rate in the second design campaign was considerably higher than the first campaign (11 out of 96 versus 1 out of 96 designs with k_cat/K_M greater than 10³M⁻¹ s⁻¹), supporting the conclusions from the first round analysis (Supplementary Figs. 21–26, Supplementary Table 3, Supplementary Discussion 2 and Supplementary Methods 4.2). Circular dichroism experiments confirmed that all active enzyme scaffolds from both design campaigns possess secondary structures consistent with their design models, indicating proper folding (Supplementary Fig. 21). The structures of ZETA_1-4 are rather different from each other and previously known metallohydrolases (Extended Data Fig. 4). The sequence positions of the catalytic residues in each of these enzymes are also very different, highlighting the diversity of RFdiffusion2 generated design solutions (Fig. 4c and Supplementary Tables 4 and 5).

We determined the structure of ZETA_2, the most active design, in the apo state at 3.5 Å using X-ray crystallography (PDB: 9PYJ; Fig. 5). The experimental structure is in good agreement with the design model, with nearly superimposable backbones (Cα root mean squared deviation (r.m.s.d.) = 1.1 Å) and the catalytic residues preorganized in the designed geometry (Fig. 5a,b). The binding pocket is complementary to the superimposed transition state from the design model (Fig. 5c). We also solved a 2.1 Å structure after soaking ZETA_2 in Zn(II) (PDB: 9PYL; Extended Data Fig. 8); whereas the backbone was again nearly superimposable with the design model (Cα r.m.s.d. = 0.8 Å) and a Zn(II) ion was present with 100% occupancy at the designed location (r.m.s.d. = 1.7 Å), one of the Zn(II)-coordinating histidines (H110) was flipped out to interact with a Zn(II) ion bound at the surface of the protein, probably because of the high Zn(II) concentration in the crystal soaking buffer (250 mM) (Extended Data Fig. 8).

**Fig. 5: Crystal structure of ZETA_2 closely resembles the design model.**

Computational design of metallohydrolases | Nature

Stem-cell treatment strengthens people with age-related frailty

White House stalls release of approved US science budgets

I will continue the fight for environmental justice in Black communities

Most Popular

How The ‘Non-Passenger Work Vehicle’ Became The Family Car

Iranians Party in the Streets After Assassination of Their Supreme Leader: Video

Xiaomi launches 17 Ultra smartphone, an AirTag clone, and an ultra slim powerbank

Creamy Tuscan Melting Cabbage | The Recipe Critic

Recent Comments

ABOUT US

POPULAR POSTS

How The ‘Non-Passenger Work Vehicle’ Became The Family Car

Iranians Party in the Streets After Assassination of Their Supreme Leader: Video

Xiaomi launches 17 Ultra smartphone, an AirTag clone, and an ultra slim powerbank

POPULAR CATEGORY