The ultra-high affinity transport proteins of ubiquitous marine bacteria

September 12, 2024

127

Identification of SBP genes

Nineteen candidate SBP genes in the genome of Ca. P. ubique strain HTCC1062 were identified through a search of the TransportDB 2.0 database⁵⁹ (http://membranetransport.org; accessed 22 January 2020). One of these genes, SAR11_0371, was annotated as a âpossible transmembrane receptorâ in UniProt and showed a non-canonical predicted domain structure consisting of a short SBP-like domain (170 amino acids) followed by a coiled coil domain and unidentified C-terminal domain. Additionally, genome context analysis showed that, unlike the other ABC SBP genes in Ca. P. ubique HTCC1062, SAR11_0371 was not colocalized with genes encoding the membrane permease or ATP-binding cassette components of an ABC transport system. Thus, SAR11_0371 was considered not to represent the SBP component of an SBP-dependent transport system and was excluded from the analysis. We also attempted to identify additional SBP genes through a search of the UniProt database for proteins in Ca. P. ubique belonging to Pfam clans CL0177 (PBP; periplasmic binding protein) and CL0144 (Periplas_BP; periplasmic binding protein like); however, this search did not return any additional candidate genes.

Cloning

The protein sequence of each SBP from Ca. P. ubique HTCC1062 was obtained from the UniProt database. Signal sequences were predicted using the SignalP 5.0 server⁶⁰ and removed. The protein sequences were then back-translated and codon-optimized for expression in E. coli, and the resulting genes were obtained as synthetic DNA from Twist Bioscience or Integrated DNA Technologies. The synthetic genes were cloned into the NdeI/XhoI site of the pET-28a(+) expression vector by In-Fusion cloning using the In-Fusion HD Cloning Kit (Takara Bio), yielding expression constructs with an N-terminal hexahistidine tag and thrombin tag. Correct assembly of each expression vector was confirmed by Sanger sequencing (FASMAC). The putative csiD gene, SAR11_1354, and several homologues of the Ca. P. ubique HTCC1062 SBPs (Supplementary Table 8) were cloned similarly into the pET-28a(+) vector, except that the thrombin tag was removed from the constructs of SAR11_1354, SAR11_0266 (Fub), or SAR11_1290 (SAR324). The sequences of oligonucleotides and synthetic genes used in this study are listed in Supplementary Table 9.

Optimization of protein expression

Protein expression was initially tested in E. coli BL21(DE3) cells grown in Luria-Bertani (LB) and Terrific Broth (TB) media at 30âÂ°C and 17âÂ°C. SAR11_0655 showed optimal soluble expression in LB medium at 17âÂ°C, SAR11_1203 showed optimal soluble expression in TB medium at 30âÂ°C, and 7 proteins (SAR11_0797, SAR11_0807, SAR11_0864, SAR11_1068, SAR11_1179, SAR11_1210, SAR11_1238, and SAR11_1361) showed optimal soluble expression in TB medium at 17âÂ°C. Next, the remaining proteins were tested for expression in E. coli SHuffle T7 cells (New England Biolabs) in TB medium at 17âÂ°C; this strain expresses the disulfide bond isomerase DsbC, which can increase soluble recombinant expression of cytoplasmic proteins by promoting correct formation of disulfide bonds. Soluble expression of SAR11_0769, SAR11_0953, SAR11_1302, and SAR11_1336 was achieved under these conditions. Due to the lack of soluble expression for the remaining four proteins (SAR11_0266, SAR11_0271, SAR11_1290 and SAR11_1346), we also tested expression of one or two close homologues of each protein (Supplementary Table 8). The SAR11_0271 homologue from âCa. Pelagibacterâ sp. HIMB1321 (denoted SAR11_0271*) could be expressed in soluble form in SHuffle T7 cells in TB medium at 17âÂ°C, while the SAR11_1346 homologue from the same species (denoted SAR11_1346*) could be expressed in soluble form in BL21(DE3) cells in TB medium at 17âÂ°C. SAR11_0271* and SAR11_1346* share 91.4% and 88.9% sequence identity, respectively, with the corresponding proteins from Ca. P. ubique HTCC1062, and the binding site residues are completely conserved (Supplementary Fig. 5), indicating that the functions and properties of the homologous SBPs are likely to be identical. Neither homologue of SAR11_0266 or SAR11_1290 could be expressed in soluble form in BL21(DE3) or SHuffle T7 cells. Expression of SAR11_0266 and SAR11_1290 without His₆ or thrombin tags also yielded insoluble protein.

Protein expression was typically evaluated by SDSâPAGE analysis as follows. Cells transformed with the relevant expression vector by electroporation were spread from a frozen glycerol stock onto an LB agar plate containing 0.2% (w/v) glucose and 25âÂµgâml^â1 kanamycin and incubated at 30âÂ°C overnight. The cells were then scraped into a small volume of LB medium and used to inoculate 3âml of the relevant growth medium containing 25âÂµgâml^â1 kanamycin in a 10âml round bottom tube at a starting OD₆₀₀ of 0.05. The culture was incubated at 37âÂ°C with shaking at 220ârpm until the OD₆₀₀ reached 0.5. One-millilitre aliquots were transferred to clean round bottom tubes and isopropyl Î²-d-1-thiogalactopyranoside (IPTG) was added to a final concentration of 0.5âmM. The induced cultures were incubated with shaking at 220ârpm at 17âÂ°C overnight or 30âÂ°C for 3âh. A 500-Âµl aliquot of each culture was resuspended in lysis buffer (20âmM Tris, 0.5âM NaCl, 1% (v/v) Triton X-100, pH 8.0) and incubated at room temperature for 10âmin. The cell lysate was centrifuged at 21,000g for 5âmin (4âÂ°C). The soluble fraction of the cell lysate was transferred to a tube containing 30âÂµl cOMPLETE His-Tag purification Ni-NTA resin (Roche) suspended in 500âÂµl buffer A (8âM urea, 20âmM Tris, 0.5âM NaCl, pH 8.0), while the insoluble fraction of the cell lysate was dissolved in 500âÂµl buffer A, centrifuged at 21,000g for 5âmin, and then transferred to a tube containing 30âÂµl Ni-NTA resin suspended in 500âÂµl buffer A. In both cases, the resin was incubated at room temperature for 10âmin, washed twice with 500âÂµl buffer A, and then eluted by incubation with 50âÂµl buffer B (8âM urea, 20âmM Tris, 0.5âM NaCl, 0.5âM imidazole, pH 8.0) at room temperature for 5âmin. Fifteen microliters of supernatant was mixed with 5âÂµl of 4Ã SDSâPAGE sample loading buffer and heated at 90âÂ°C for 10âmin, then loaded onto a 4â15% pre-cast SDSâPAGE gel (Bio-Rad). The gel was run at 200âV for 30âmin and visualized with Coomassie Blue.

Large-scale protein expression and purification

For expression and purification of the Ca. P. ubique SBPs, E. coli BL21(DE3) or SHuffle T7 cells transformed with the relevant expression vector were spread from a frozen glycerol stock onto an LB agar plate containing 0.2% (w/v) glucose and 25âÂµgâml^â1 kanamycin, and incubated at 30âÂ°C overnight. The cells were then scraped into 3âml LB medium, and 500âÂµl of the resulting cell suspension was used to inoculate 500âml LB or TB medium supplemented with 25âÂµgâml^â1 kanamycin in a 2âl or 3âl flask, preheated at 37âÂ°C. The culture was incubated at 37âÂ°C with shaking at 220ârpm until the OD₆₀₀ reached 0.5, then cooled briefly in an ice-water bath until the temperature reached ~25âÂ°C. IPTG was added to a concentration of 0.5âmM, and the culture was incubated at 17âÂ°C with shaking at 220ârpm for a further 16âh. Cells were pelleted by centrifugation (3,300g, 15âmin, 4âÂ°C) and frozen at â20âÂ°C until use. For protein purification, cells were thawed on ice, resuspended in 100âml Ni binding buffer (20âmM Tris, 500âmM NaCl, 20âmM imidazole, pH 8.0), and lysed by sonication. After addition of 500 U Benzonase Nuclease (Sigma-Aldrich) to digest DNA, the cell lysate was centrifuged at 10,000g for 1âh (4âÂ°C). The supernatant was filtered through a 0.45-Âµm syringe filter and then loaded onto a 1âml HisTrap HP column (Cytiva) equilibrated with Ni wash buffer using an ÃKTA Pure FPLC system (Cytiva). For purification under native conditions, the column was washed with 10âml Ni binding buffer followed by 10âml Ni wash buffer (20âmM Tris, 500âmM NaCl, 44âmM imidazole, pH 8.0), and then the target protein was eluted in 10âml Ni elution buffer (20âmM Tris, 500âmM NaCl, 500âmM imidazole, pH 8.0). For purification under denaturing conditions, the column was washed with denaturing Ni binding buffer (8âM urea, 20âmM Tris, 250âmM NaCl, 20âmM imidazole, pH 8.0) at 1âmlâmin^â1 for 30âmin after loading of the clarified cell lysate, and the target protein was eluted with 10âml denaturing Ni elution buffer (8âM urea, 20âmM Tris, 250âmM NaCl, 250âmM imidazole, pH 8.0). Proteins purified under native conditions were concentrated to 400âÂµl using a 10âkDa molecular weight cut-off (MWCO) Amicon Ultra-4 centrifugal spin concentrator (Merck-Millipore) and purified by size-exclusion chromatography using a Superdex 200 Increase 10/300 column (Cytiva), eluting in DSF buffer (20âmM HEPES, 0.3âM NaCl, pH 7.50). For storage, proteins were concentrated to a volume of 0.5â2âml and glycerol was added to a concentration of 10% (v/v). The protein was then flash-frozen in 100â200-Âµl aliquots in liquid nitrogen and stored at â80âÂ°C until use. ArgT from S. enterica was expressed from a pETMCSIII plasmid and purified as described previously⁶¹.

Protein refolding

In most cases, protein purified under denaturing conditions was diluted to a concentration of 0.5âmgâml^â1 and volume of 10â30âml in denaturing Ni binding buffer (8âM urea, 20âmM Tris, 250âmM NaCl, 20âmM imidazole, pH 8.0) and transferred to 10âkDa MWCO SnakeSkin dialysis tubing (Thermo Scientific). The protein was then dialysed against 2âl dialysis buffer (20âmM Tris, 150âmM NaCl, pH 8.0) at 4âÂ°C with three buffer changes over a period of 24âh. The protein was collected and exchanged into DSF buffer using a 10âkDa MWCO Amicon Ultra-15 centrifugal concentrator, then concentrated to 400âÂµl and purified by size-exclusion chromatography as described above. For SAR11_1346*, an improved yield of monomeric protein was obtained using the rapid dilution for refolding: 2âml of denatured protein (5âmgâml^â1 in denaturing Ni binding buffer) was added dropwise with stirring to 40âml pre-chilled refolding buffer (20âmM Tris, 150âmM NaCl, 10% (v/v) glycerol, pH 8.0) and incubated at 4âÂ°C with stirring for 20âh. The protein was then concentrated and purified by size-exclusion chromatography as above.

Differential scanning fluorimetry

DSF experiments were performed using a StepOnePlus Real-Time PCR System and StepOne software (Applied Biosystems) based on literature protocols^62,63. Reaction mixtures were prepared in twin.tec Real-Time PCR Plates (Eppendorf) and contained 5Ã SYPRO Orange (Sigma-Aldrich), 2.5âÂµM protein, and 2âÂµl 10Ã ligand in a total volume of 20âÂµl DSF buffer. The plate was sealed with optically clear sealing film and centrifuged at 2,000g for 1âmin before loading into the real-time PCR instrument. The temperature was ramped at a rate of 1% (approximately 1.33âÂ°Câmin^â1), typically over a 60âÂ°C window centred on the melting temperature (T_M) of the target protein. Fluorescence was monitored using the ROX channel. T_M values were determined by taking the derivative of fluorescence intensity with respect to temperature and fitting the resulting data to a quadratic equation in a 6âÂ°C window in the vicinity of the T_M in R software.

Proteins were initially screened for binding to metabolites in four Phenotype MicroArray plates, PM1 to PM4 (Biolog). The contents of each well were dissolved in 50âÂµl (PM1 to PM3) or 20âÂµl (PM4) sterile filtered water, giving a concentration of approximately 10â20âmM in each well⁶³. The plates were then sealed with aluminium sealing films and stored at â80âÂ°C. Prior to use, the plates were thawed at room temperature and then shaken at 30âÂ°C until the compounds had redissolved. Two microliters of each compound was added to 18âÂµl reaction mixture prepared as described above. A 2âÂ°C increase in T_M compared with the median value across the plate was taken as indicative of binding^63,64.

For screening of individual compounds and confirmatory assays, compounds were dissolved at a concentration of 100âmM in ligand buffer (0.1âM HEPES pH 7.5), and the pH was adjusted with 1âM NaOH or 1âM HCl if necessary (specifically, if the pH of a 10âmM solution of the compound diluted in DSF buffer fell outside the range 6.5â8.0). These stock solutions were stored at â20âÂ°C. Two microlitres of each compound was directly added to 18âÂµl reaction mixture, giving a final concentration of 10âmM, or first diluted 10-fold or 100-fold in DSF buffer to give final concentrations of 1âmM or 0.1âmM in the assay. A list of chemicals used for screening, including the supplier and catalogue number, is provided in Supplementary Table 3. Sodium (R)- and (S)-2,3-dihydroxypropane-1-sulfonate were synthesized from (R)- and (S)-3-chloro-1,2-propanediol following a literature protocol⁶⁵ and verified by ¹H and ¹³C NMR.

In the case of the TRAPÂ andÂ TTT SBPs, SAR11_0864 and SAR11_1203, we hypothesized that a metal ion might be required for high-affinity binding, due to the biphasic melting curve observed in the presence of isethionate in Biolog screening experiments, suggesting the presence of a mixture of active and inactive protein (SAR11_0864) or due to the discord between the highly charged ligand and the largely uncharged binding site of the SBP (SAR11_1203). Therefore, we tested the effect of the addition of metal ions (Mg²⁺, Ca²⁺, K⁺, Zn²⁺, Mn²⁺, Co²⁺, Ni²⁺, Fe²⁺ and Fe³⁺) on binding of isethionate to SAR11_0864 and citrate to SAR11_1203 by DSF (Supplementary Fig. 6). DSF experiments were performed using refolded protein as described above, with the addition of 1âmM metal ion and 1âmM ligand. Based on these results, and considering the concentration of each metal ion in seawater⁶⁶, 10âmM CaCl₂ (SAR11_0864) or 53âmM MgSO₄ (SAR11_1203) were included in subsequent DSF and ITC binding experiments for these SBPs.

Isothermal titration calorimetry

ITC experiments were performed using a MicroCal PEAQ-ITC system (Malvern Panalytical). Protein samples were refolded and freshly purified (not frozen), and protein and ligand samples were prepared in the same batch of DSF buffer used for size-exclusion chromatography to minimize the heat of dilution. For SAR11_0864 and SAR11_1203, calcium chloride (final concentration 10.3âmM) or magnesium sulfate (final concentration 53âmM), respectively, was added to the protein and ligand samples. Experiments were performed at 25âÂ°C with stirring at 700ârpm and 10âÂµcalâs^â1 reference power. Titration parameters were varied depending on the protein yield, the fraction of active protein, and the affinity and enthalpy of the interaction. In a typical titration, 35âÂµM protein was titrated with 1Ãâ0.4-Âµl and 19Ãâ1.6-Âµl injections of ligand, with the ligand concentration chosen to give >1.5-fold molar excess of ligand to active protein at the end of the titration. ITC experiments were generally performed at least in duplicate.

For simple 1:1 binding interactions, the association constant (K_a), enthalpy (ÎH), and stoichiometry (n) of the interaction were determined by fitting the data to the one-set-of-sites model in MicroCal PEAQ-ITC analysis software. In the case of the SAR11_0769 + d-glucose interaction, thermodynamic parameters were estimated through Bayesian fitting to a modified competitive binding model, which incorporated an additional parameter to account for the fraction of the ligand in each anomeric form, and a two-sets-of-sites model implemented in pytc software⁶⁷; the latter model is equivalent to the two-sets-of-sites model in the MicroCal software, except without the minor correction for heat associated with the displaced volume for each injection (for consistency with the other models in pytc). Thermodynamic parameters for the SAR11_0953 + l-glutamate, SAR11_1203 + citrate, SAR11_1210 + l-arginine, SAR11_1336 + glycine betaine, and SAR11_1346* + l-leucine interactions were determined through competitive displacement experiments⁶⁸, in which l-phenylalanine, cis-aconitate, d-octopine, glycine, or l-serine (respectively) were included at a fixed concentration in the cell to reduce the apparent binding affinity for the ligand of interest. The data for these competitive binding experiments were analysed by Bayesian fitting to the competitive binding sites model in pytc software. To confirm the high affinity of the SAR11_1210 + l-arginine interaction, a competitive binding experiment was performed where SAR11_1210 and ArgT from S. enterica (which has a K_d of 15ânM for l-arginine) were included in the cell together at the same concentration (28âÂµM) and titrated with l-arginine. Similarly, for the SAR11_1210(E108A)â+âl-arginine interaction, a mixture of SAR11_1210(E108A) and SAR11_1210 (35âÂµM each) was titrated with l-arginine. For these titrations, the data was fitted to a two-sets-of-sites binding model as described above to obtain thermodynamic parameters for both proteinâligand interactions. For all analyses, the heat of dilution was assumed to be a small constant value and included as a fitted parameter in the model. The validity of this assumption was confirmed for each ligand by performing a control titration where the ligand was injected into DSF buffer.

Spectrophotometric analysis of iron(iii) binding

Binding of iron(iii) to SAR11_1238 was analysed using a spectrophotometric assay based on literature protocols^69,70. UVâvis spectra were recorded at room temperature (25âÂ°C) in a 96-well plate from 300ânm to 630ânm with 1ânm bandwidth using a Multiskan GO spectrophotometer (Thermo Scientific). An initial protein concentration of 100âÂµM and an initial volume of 200âÂµl were used for all spectrophotometric assays. First, purified SAR11_1238 was thawed and exchanged into 50âmM Tris, 200âmM NaCl buffer (pH 8.0) using a centrifugal concentrator, and the spectrum of the resulting protein sample was recorded. To prepare unliganded protein for iron-binding assays, the protein was exchanged into 50âmM Tris, 200âmM NaCl, 20âmM sodium citrate buffer (pH 8.0) by three rounds of 30-fold dilution and concentration, allowing chelation and removal of the metal ligand. Citrate was then removed by four rounds of 30-fold dilution and concentration with 50âmM Tris, 200âmM NaCl buffer (pH 8.0). Binding assays were performed by titrating the unliganded protein (200âÂµl of 100âÂµM solution) with 8Ã or 10Ã 5-Âµl injections of 800âÂµM iron(iii) solution, which was prepared from iron(iii) chloride and a 2.5-fold molar excess of trisodium citrate (which ensures that the iron(iii) remains soluble) in ultrapure water. To confirm that SAR11_1238 binds iron(iii) rather than the iron(iii)âcitrate complex, the protein was also titrated under the same conditions with 800âÂµM ammonium iron(II) sulfate; under the aerobic conditions of the assay, iron(ii) is rapidly oxidized to iron(iii)⁶⁹. UVâvis spectra were recorded 1âmin (iron(ii)) or 15âmin (iron(iii)) after each injection. Finally, a competitive binding assay with citrate was used to estimate the affinity of SAR11_1238 for iron(iii). The protein was saturated with a twofold molar excess of iron(iii) solution, diluted to a volume of 1âml, and then dialysed against 500âml of 50âmM Tris, 200âmM NaCl buffer (pH 8.0) at 4âÂ°C overnight to remove excess iron(iii) and citrate. The protein was then concentrated to 100âÂµM and titrated with 5-Âµl injections of 8 twofold serial dilutions of 500âmM sodium citrate (adjusted to pH 8.0 in 50âmM Tris, 200âmM NaCl buffer). The absorbance at 440ânm was recorded 5âmin after each addition. The data were fitted to a hyperbolic curve, yielding an apparent K_d of 9.0âmM for citrate. Given that citrate has a K_d of ~10^â17âM for iron(iii), this implies that SAR11_1238 has a K_d for iron(iii) on the order of ~10^â19âM, similar to previously characterized iron(iii)-binding proteins^70,71.

X-ray crystallography

For the SAR11_0769/d-glucose and SAR11_1210/l-arginine structures, the proteins were first expressed and purified by nickel affinity chromatography under native conditions as described above. After addition of a 20-fold molar excess of d-glucose (SAR11_0769) or l-arginine (SAR11_1210), the protein was purified further by size-exclusion chromatography on a HiLoad 26/600 Superdex 75âpg column (Cytiva), eluting in 3Ã crystallization buffer (60âmM HEPES, 150âmM NaCl, pH 7.5). Fractions containing the target protein were collected, and d-glucose (SAR11_0769) or l-arginine (SAR11_1210) was added to a concentration of 30âÂµM. The protein was concentrated to a volume of ~500âÂµl, diluted threefold in water to reduce the NaCl concentration to 50âmM, and then concentrated further to 12âmgâml^â1. For the SAR11_0769/d-galactose and SAR11_0655/l-pyroglutamate structures, the proteins were expressed and purified in the same way, except that no ligands were added. Protein crystals were obtained using the vapour diffusion method in hanging drops at 20âÂ°C, then cryoprotected and flash-frozen in liquid nitrogen. Crystallization and cryoprotection conditions for each protein are given inÂ Supplementary Methods. X-ray diffraction data were collected on beamline BL32XU at the SPring-8 synchrotron (Harima, Japan), using the ZOO suite for automated data collection⁷². The data were automatically indexed, integrated, scaled and merged in XDS⁷³ using KAMO⁷⁴. The structure was solved by molecular replacement in Phaser⁷⁵ or MOLREP⁷⁶. For SAR11_1210, the structure of an opine-binding protein from Agrobacterium fabrum (PDB ID 5OT8) was used as a search model; in the remaining cases, an AlphaFold2 model was used⁷⁷. The structures were then refined by iterative real-space and reciprocal-space refinement in REFMAC⁷⁸, Phenix⁷⁹, and COOT⁸⁰. Data collection and refinement statistics are given in Supplementary Table 10 and Supplementary Table 11. Structures were visualized in Pymol.

Gas chromatographyâmass spectrometry

SBPs purified under native conditions were exchanged into 200âmM ammonium acetate using a PD-10 desalting column (Cytiva) and concentrated to ~1âmM. A 10-nmol aliquot of protein was mixed with 10âÂµl of 300âÂµM Î±-methylglucopyranoside (as an internal control) and 200âÂµl methanol. The mixture was agitated at 1500ârpm at 24âÂ°C for 10âmin and then centrifuged at 21,000g for 20âmin at 4âÂ°C. The supernatant was evaporated to dryness using a vacuum evaporator, redissolved in 20âÂµl anhydrous pyridine, and derivatized by addition of 30âÂµl N-methyl-N-(trimethylsilyl)trifluoroacetamide (MSTFA) containing 1% trimethylchlorosilane (Supelco) followed by incubation at 70âÂ°C for 1âh. In the case of SAR11_1361, the dried sample was instead dissolved in 20âÂµl of 20âmgâml^â1 methoxyamine hydrochloride in anhydrous pyridine and incubated at 37âÂ°C for 90âmin with agitation at 750ârpm before addition of the MSTFA mixture. The derivatized samples were injected immediately onto an Agilent 7890âA GC System (Agilent Technologies) equipped with a PAL COMBI-XT autosampler (CTC Analytics) and connected to a PEGASUS 4D GCÃGC TOF-MS instrument (LECO) operating in one-dimensional mode. The GC was fitted with a DB-1MS column (Agilent Technologies) with 30âm length, 0.25âmm internal diameter, and 0.25âÂµm film thickness. The instrument was operated in pulsed split mode with a split ratio of 2 and injection volume of 1âÂµl. The inlet temperature was 250âÂ°C. Helium was used as the carrier gas with a flow rate of 1âmlâmin^â1. The GC oven temperature was held at 70âÂ°C for 5âmin, then raised at 12âÂ°Câmin^â1 to 300âÂ°C, and finally held at 300âÂ°C for 10âmin. Mass spectrometry data were collected from 50 to 500âm/z after a 6.5-min solvent delay. The ion source and transfer line temperatures were 250âÂ°C and the ionization energy was 70âeV. Data analysis and spectral database searches against the NIST database were performed using ChromaTOF software (LECO). Protein-derived samples were analysed before control samples to prevent carryover.

Biogeographical analysis

Biogeographical analysis was performed using the Ocean Gene Atlas v2.0 server³³. Abundance data for each SBP gene from Ca. P. ubique HTCC1062 in the Tara Oceans OM-RGC_v2_metaG and OM-RGC_v2_metaT datasets was obtained through a BLAST search with a stringent e-value threshold of 10^â30. To avoid inclusion of homologous SBPs with different transport functions, hits with a sequence identity of less than 40% (for ABC SBPs) or 55% (for TRAP and TTT SBPs) compared with the corresponding HTCC1062 SBP were excluded from the analysis.

To estimate the total abundance of SBP transcripts, abundance data for each of the 38 PFAM families in CL0177 (PBP; periplasmic binding protein) and CL0144 (Periplas_BP; periplasmic binding protein like), excluding the transferrin family (PF00405) and any families that contain solely enzymes or transcription factors (PF00800, PF01379, PF01634, PF02621, PF03466, PF09084), were obtained using a hmmer search of the OM-RGC_v2_metaT dataset with an e-value threshold of 10^â10. Hits were obtained for 26 out of 31 PFAM families. For each PFAM family, the corresponding hidden Markov model (HMM) was obtained from the InterPro database⁸¹. The protein sequences from the hmmer search were then aligned to this HMM using hmmalign and used to construct a new HMM using hmmbuild in HMMER3.4 (http://hmmer.org). A second hmmer search of the OM-RGC_v2_metaT dataset, with a lower e-value threshold of 10^â5, was then conducted using the resulting HMM. The hits from all 52 searches were combined and redundant hits were removed, resulting in a total of 211,222 unique SBP genes. The two-step search recovered 94% of the 23,879 genes identified as homologues of the Ca. P. ubique HTCC1062 SBPs in the BLAST analysis before application of a sequence identity threshold; the remaining 1267 genes were also added to the list of SBP genes. Finally, the total abundance of SBP genes at each site was calculated.

To estimate the percentage of SAR11 bacteria at a site containing a given SBP from Ca. P. ubique HTCC1062, we used the recruitment values of 159 SAR11 genomes in the Tara Ocean metagenome dataset calculated by Haro-Moreno et al.³⁴. The presence of a homologue of each SBP in each of the corresponding genomes was determined by BLAST using a 50% sequence identity and 50% coverage threshold. The relative abundance of SAR11 bacteria containing a given SBP homologue was then calculated for each station. Plots were generated using R and GraphPad Prism.

Phylogenetic analysis

Protein sequences homologous to the SBP of interest were identified via a BLAST search of the UniProtKB Reference Proteomes and Swiss-Prot databases⁸². The resulting sequences were filtered to remove a small number of unusually long sequences (>20% greater than mean length) and aligned in MUSCLE v3.8.31⁸³. The alignment was trimmed in trimAl v1.2 using the automated1 option⁸⁴ and then used to generate a maximum-likelihood phylogeny in FastTree v2.1.11, using LGâ+âÎ₂₀ as the substitution model⁸⁵. For each protein sequence in the tree, the fraction of conserved binding site residues, compared with the corresponding protein from Ca. P. ubique HTCC1062, was estimated. The binding site residues were obtained from the crystal structure (SAR11_0769) or estimated from an AlphaFold2 model^86,87. For this analysis, the following substitutions were treated as conservative: S/T, I/M, V/L, I/V, L/M, D/E, Q/N, A/V, F/Y, Y/W, F/W. Phylogenetic tree figures were generated using the ggtree package in R⁸⁸. Figures showing taxonomic distribution (Extended Data Fig. 8b) were generated using Krona⁸⁹.

Reporting summary

Further information on research design is available in theÂ Nature Portfolio Reporting Summary linked to this article.

The ultra-high affinity transport proteins of ubiquitous marine bacteria

Identification of SBP genes

Cloning

Optimization of protein expression

Large-scale protein expression and purification

Protein refolding

Differential scanning fluorimetry

Isothermal titration calorimetry

Spectrophotometric analysis of iron(iii) binding

X-ray crystallography

Gas chromatographyâmass spectrometry

Biogeographical analysis

Phylogenetic analysis

Reporting summary

Why do curling stones slide across ice the way they do?

Are obesity drugs causing a severe complication? What the science says

AI is threatening science jobs. Which ones are most at risk?

Most Popular

A look at Japan’s Team Mirai, a party founded by software engineers that won 11 of 465 parliament seats by promising self-driving buses and...

Miley Cyrus Spotted Filming ‘Hannah Montana 20th Anniversary Special’ in Malibu

Wednesday’s Karly Hartzman, Hayden Pedigo, June Chikuma, More Join NTS Radio as Residents

Fornasetti, Cc-Tapis Exquisite Artisan Rug Collaboration

Recent Comments

ABOUT US

POPULAR POSTS

A look at Japan’s Team Mirai, a party founded by software engineers that won 11 of 465 parliament seats by promising self-driving buses and...

Miley Cyrus Spotted Filming ‘Hannah Montana 20th Anniversary Special’ in Malibu

Wednesday’s Karly Hartzman, Hayden Pedigo, June Chikuma, More Join NTS Radio as Residents

POPULAR CATEGORY

The ultra-high affinity transport proteins of ubiquitous marine bacteria

Identification of SBP genes

Cloning

Optimization of protein expression

Large-scale protein expression and purification

Protein refolding

Differential scanning fluorimetry

Isothermal titration calorimetry

Spectrophotometric analysis of iron(iii) binding

X-ray crystallography

Gas chromatographyâmass spectrometry

Biogeographical analysis

Phylogenetic analysis

Reporting summary

Most Popular

Recent Comments

ABOUT US

POPULAR POSTS

POPULAR CATEGORY

Gas chromatographyâmass spectrometry