Cryptic variation fuels plant phenotypic change through hierarchical epistasis

July 11, 2025

89

Motif enrichment and variant discovery

FIMO motif enrichment was performed on the sequence of open chromatin regions in the tomato meristem upstream of SlEJ2 using the A. thaliana non-redundant motif database curated at Plant TFDB (P < 0.00001 and q < 0.01)⁵⁶. The same regions were used to search for insertion–deletion (indel) variants called previously from the tomato pangenome⁵. Indels overlapping with annotated motifs were confirmed to not exist in linkage with previously reported EJ2 variants (ej2^w, sb3) by PCR and then used for subsequent experiments^27,57 (see the ‘Plant materials’ section below; Supplementary Table 1).

Plant materials

Seeds of WT S. lycopersicum (cultivar M82, LA3475), S. habrochaites (LA1777) and S. pennellii (LA0716) were from our stocks. Introgression line IL3-4 (S. pennellii chromosome 3 introgressed into M82, LA4046) was obtained from the Tomato Genome Resource Center (Department of Plant Sciences, University of California at Davis) and the variant was validated by PCR amplification and Sanger sequencing⁵⁸ (a list of all of the primers used in this study is provided in Supplementary Table 27). Two overlapping S. habrochaites chromosome 3 introgression lines LA3925 and LA3926, introgressed into tomato cultivar TA209, were obtained from the Tomato Genome Resource Center (Department of Plant Sciences, University of California at Davis)⁵⁹. After validation by PCR amplification and Sanger sequencing, only LA3925 contained the ShEJ2^pro-3 variant of interest, so LA3926 was used as a control for crosses between M82 and the introgressed region in the TA209 background. Mutants j2-TE ej2^w and j2 ej2 were from our stocks, as previously described²⁷.

Genome editing

CRISPR–Cas9 mutagenesis and generation of transgenic tomato plants were performed according to our standard protocol⁶⁰. In brief, gRNAs were designed using Geneious Prime (https://www.geneious.com/) (a list of the gRNAs used in this study is provided in Supplementary Table 27). For Cas9 multiplex editing, the Golden Gate cloning system was used to assemble the binary vector containing the Cas9 and the specific gRNAs^60,61. For SpRY editing, vectors were constructed through a modular Gateway assembly, as described previously (Invitrogen)⁶². The final binary vectors were then transformed into the tomato cultivar M82 by Agrobacterium tumefaciens-mediated transformation through tissue culture⁶³. First-generation transgenic plants (T₀) were genotyped with specific primers surrounding the target sites (a list of all of the primers used in this study is provided in Supplementary Table 27). To purify alleles from potential spontaneous mutations or CRISPR–Cas9 off-target effects after plant transformation, all T₀ transgenic lines were backcrossed (BC1) to parental WT plants. BC1 populations were then screened by PCR and kanamycin herbicide susceptibility for plants lacking the Cas9 transgene, PCR products of the targeted regions were Sanger sequenced to confirm inheritance of alleles and allele-specific genotyping assays were designed for genotyping in subsequent generations. Selected BC1 plants were self-fertilized to generate F₂ populations, and these segregating populations were used to validate the phenotypic effects of each allele by co-segregation. F₂ or F₃ homozygous mutant plants were then used for subsequent crossing and quantitative phenotypic analyses.

Growth conditions and phenotyping

Seeds were directly sown in soil in 96-cell plastic flats and grown to 4-week-old seedlings in the greenhouse. The seedlings were then transplanted to 4 l pots in the greenhouse for crossing and bulking purposes or directly to the fields at Cold Spring Harbor Laboratory, New York or at The University of Florida Gulf Coast Research and Education Center. Greenhouse conditions are long-day (16 h light, 26–28 °C followed by 8 h dark, 18–20 °C; 40–60% relative humidity) with natural light supplemented with artificial light from high-pressure sodium bulbs (~250 μmol m⁻² s⁻¹). Plants in the fields were grown under drip irrigation and standard fertilizer regimes, and were used for quantifications of inflorescence branching, fruit shape and sepal length.

To quantify inflorescence branching, inflorescences were counted in order of emergence in two rounds, approximately 60 days after sowing and 75 days after sowing. When available, four primary inflorescences and six axillary inflorescences were counted per plant. 60 or fewer branches were counted, if branching exceeded 60, too many to count (TMTC) was recorded and the number of branching events was treated as 60 for downstream analysis. Proliferated meristem in the place of inflorescence was indicated in the data as proliferated. Occasionally, inflorescences would fail to develop into countable structures, possibly due to stress, in which case, inhibited was recorded.

To quantify fruit shape, ten fruits were collected at the mature green stage, cut in transverse sections and scanned on a single plane. The ratio of maximum height to width, fruit shape index I, was determined from scanned images using Tomato Analyzer⁶⁴. To quantify sepal length, ten closed mature floral buds of similar developmental stage (1–2 days before anthesis, that is, before flower opening) per genotype were collected, length of sepals and petals were manually measured and the sepal/petal ratio was calculated²⁷.

Phylogenetic trees

J2/EJ2 phylogeny was adapted from a previous study⁴. Putative orthologues of SlPLT3/7 and AtPLT3/7 were identified using NCBI BLASTP against proteomes of species selected for taxonomic breadth, representing asterids, rosids, early eudicots and monocots. Retrieved protein sequences were aligned using MAFFT (v.7.505) using the default parameters. An HMM profile was constructed from the alignment using hmmbuild in HMMER (v.3.3.2) and used to search combined species proteomes with hmmsearch (E < 1 × 10⁻⁵) to identify additional homologs. All hits were extracted, aligned with MAFFT and manually trimmed when necessary. A maximum-likelihood phylogenetic tree was inferred using IQ-TREE (v.2.2.2) with automatic model selection (-m MFP) and 1,000 ultrafast bootstrap replicates (-bb 1000). The resulting tree was rooted using XP_042461702.1_Zofficinale as an outgroup. Bootstrap support values were used to modulate branch thickness in the visualization: branches with support >90 were plotted thickest, those between 75–90 were medium and those <75 remained thin. The tree was visualized in R using the ape package (v.5.8-1).

RNA extraction and Illumina sequencing

Inflorescence meristems were collected from n = 4 plants at 8 weeks old under stereoscope magnification. Tissue was frozen, ground with beads and RNA was extracted using TRIzol (Invitrogen) and the Direct-zol RNA Miniprep kit with on-column DNA digestion (Zymo Research). RNA was quantified using the Qubit fluorimeter RNA HS assay kit (Invitrogen). The samples were treated with the Ribo-Zero rRNA removal kit (Epicenter) and the libraries prepared with the TruSeq V2 RNA-seq prep kit (Illumina).

RNA-seq analysis

Published RNA-sequencing (RNA-seq) data of WT M82, ej2, j2 and anantha mutant meristems were downloaded from Sequence Read Archive (SRA) PRJNA376115 and PRJNA343677 (refs. ^24,34). Reads were trimmed with Trimmomatic (ILLUMINACLIP:TruSeq2-PE.fa:2:30:10:1:FALSE LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36) and aligned to the cDNA annotation of the reference genome sequence of tomato (SL4.0) using STAR (v.2.6.1.d)⁶⁵. Normalization and quantification of individual transcript expression was done in R by calculating transcripts per million (TPM). Differential expression was calculated in R by DESeq2 time course analysis with LRT and the top 200 most differentially expressed genes (log₂[FC]) across WT meristem maturation were used for PCA of all meristem samples using Python scikit-learn PCA.transform⁶⁶.

Dual-luciferase assay

A Gateway-compatible dual-luciferase reporter vector (pSZ106) was assembled using the MoClo GoldenGate assembly system^61,67. In brief, a Gateway AttR4-AttL1R cassette (Invitrogen) was cloned upstream of a 46 bp minimal 35S promoter driving the Firefly luciferase coding sequence (pICSL80001, pL0_fLUC-I (CDS1)) with a nopaline synthetase terminator (pICH41421)^61,67. A cauliflower mosaic virus 35S promoter (pICH51266) was cloned upstream of the coding sequence of Renilla luciferase (pSB123, pL0_rLUC-I (CDS1), Addgene) with a nopaline synthetase terminator (pICH41421)⁶⁷. Both luciferase expression cassettes were cloned into the pICSL4723 binary vector backbone with an NPTII selection cassette. SlEJ2^pro-3, ShEJ2^pro-3 and SpEJ2^pro-3 alleles were cloned into pDONR P4-P1r and introduced into pSZ106 by Gateway cloning (Invitrogen). SlERF12 (Solyc02g077840), SlPLT380 (Solyc05g051380) and SlPLT710-short (Solyc11g010710) were cloned into pDONR207 and introduced into pEAQ-HT-DEST3 by Gateway cloning (Invitrogen).

All binary expression vectors were transformed into A. tumefaciens and cultured at 28 °C overnight in selective media. Overnight cultures were diluted and grown at 28 °C to an optical density at 600 nm (OD₆₀₀) of 1 a.u., centrifuged and washed into inductive medium (10 mM MES pH 5.7, 10 mM MgCl₂, 100 µM 3′,5′-dimethyoxy-4′-hydroxyacetophenone) at OD₆₀₀ 1 a.u. Bacteria was induced for 3 h lying horizontally at the bench, then equal volumes of promoter and TF medium were combined and co-infiltrated into young fully expanded leaves of 4 week old N. benthamiana plants grown in long days (16 h–8 h light–dark, 22 °C; 40–60% relative humidity). Plants were returned to the growth chamber and 100 mg tissue was collected and frozen for measurement 3 days after infiltration.

Luciferase activity was measured using the Dual Luciferase Reporter Assay System kit (Promega) as described previously⁶⁸. In brief, tissue was homogenized in the Spex Sample Prep 2010 Geno/Grinder (Cole Parmer) and 10 mg of tissue powder was mixed with 100 µl of passive lysis buffer (Promega). Cellular debris was pelleted at 7,500g for 1 min and the supernatant was diluted 40× in passive lysis buffer and 15 µl of sample was transferred to three replicate wells of a white flat-bottom Costar 96-well plate (Corning). The assay was measured using a GloMax 96 microplate luminometer, and 75 µl per well of luciferase assay reagent and Stop & Glo reagent was added and measured stepwise (Promega).

Statistics and reproducibility

All phenotyping and molecular experiments were repeated in at least three seasons with similar results. Transcriptomics was performed once with biological replication. For Fig. 4e,f, n = 10 inflorescences per plant and the number of biologically independent plants per genotype–season combination varies, with quartiles at 2, 5 and 9 plants (source data are provided in Supplementary Table 24).

Segregating populations

BC1 inbred plants of the genotype EJ2pro j2 were crossed to BC1 inbred plants of the genotype plt3 plt7/+ and genotyped in the F₁ generation by allele-specific PCR to determine the presence of all desired alleles. Segregating F₂ seed was sown in the greenhouse in populations of either 192 or 384 plants, tissue was collected for DNA extraction and plants were transplanted without previous genotyping over the course of four seasons, two in fields at Cold Spring Harbor Laboratory, New York and two at The University of Florida Gulf Coast Research and Education Center. The genotypes of plants were confirmed after phenotyping by allele-specific PCR assays.

Linear regression models

Phenotypic data were summarized at the plant level for quantitative modelling, with abnormal inflorescences marked as proliferated or inhibited excluded from further analysis. For each plant i, we consider the total number of branching events across all inflorescences y_i and the number of inflorescences t_i. The total number of branching events y_i was modelled as being either Poisson or negative binomially distributed with exposure t_i.

$${y}_{i} \sim {\rm{Poisson}}\left({{t}_{i}\times f}^{-1}(\mu ({x}_{i}))\right)$$

$${y}_{i} \sim {\rm{Negative}}\,{\rm{Binomial}}\left({{t}_{i}\times f}^{-1}(\mu ({x}_{i})),\alpha \right),$$

where f represents a link function and μ(x_i) represents the phenotypic mean of the genotype x_i of plant i and α is the overdispersion parameter. Var(y_i) = μ(x_i) + αμ(x_i)² so α reflects the additional variance relative to the expectation under a Poisson model. Under an additive model, the expected mean μ_add(x) of any given genotype x is given by:

$${\mu }_{{\rm{add}}}({x}_{i})={\theta }_{0}+\sum _{l}{\theta }_{l}^{a}{s}_{l}^{a}({x}_{i})+{\theta }_{l}^{d}{s}_{l}^{d}({x}_{i}),$$

where ${s}_{l}^{a}({x}_{i})=\{-1,\,0,\,1\}$ and ${s}_{l}^{d}({x}_{i})=\{0,\,1,\,0\}$ if locus l in genotype x_i is homozygous WT, heterozygous mutant and homozygous mutant, respectively. Thus ${\theta }_{l}^{a}$ represents the homozygous effects at locus l, whereas ${\theta }_{l}^{d}$ represents the deviation of the heterozygous effect from the semi-dominant expectation.

A basis for pairwise interaction models was built by extending the basis for the additive model with additional basis vectors composed by taking the product of each possible pair of additive and dominance components⁶⁹. Under a pairwise model, the expected mean is given by:

$$\begin{array}{l}{\mu }_{{\rm{pw}}}({x}_{i})\,=\,{\mu }_{{\rm{add}}}({x}_{i})+\sum _{l,m}{\theta }_{l,m}^{a,a}{s}_{l}^{a}({x}_{i}){s}_{m}^{a}({x}_{i})\,+\,{\theta }_{l,m}^{a,d}{s}_{l}^{a}({x}_{i}){s}_{m}^{d}({x}_{i})\\ \,\,\,+\,{\theta }_{l,m}^{d,a}{s}_{l}^{d}({x}_{i}){s}_{m}^{a}({x}_{i})\,+\,{\theta }_{l,m}^{d,d}{s}_{l}^{d}({x}_{i}){s}_{m}^{d}({x}_{i}).\end{array}$$

Models were defined and fit using the statsmodels⁵⁵ Python package using Poisson and negative binomial likelihoods with the identity (f(x) = x) and log link functions (f(x) = log x). Genotype–season MLEs for the number of branching events were obtained by defining a dummy variable for each genotype–season combination that took a value of 1 for plants of that genotype and 0 otherwise, while assuming that all genotypes share the overdispersion parameter for the negative binomial likelihood function representing plant-to-plant variability that is jointly estimated with the genotype–season means. Confidence intervals for the MLEs were derived using statsmodels with a log-link between model parameters (genotype estimates) and the average number of branching events.

Hierarchical model

Data were modelled with a negative binomial likelihood function as explained in the ‘Linear regression models’ section. In this hierarchical model, each paralogue pair has an effect that is modelled separately through a complete pairwise interaction model into φ_PLT and φ_SEP, parametrized by the phenotypic effect between any genotype g (combination of WT, heterozygous or homozygous mutants) and the WT ${\theta }_{g}^{{\rm{PLT}}}$ and ${\theta }_{g}^{{\rm{SEP}}}$ at the PLTs or SEP pair of loci, respectively:

$${\varphi }_{{\rm{PLT}}}({x}_{i})=\sum _{g\ne {\rm{WT}}}{\theta }_{g}^{{\rm{PLT}}}{s}_{g}^{{\rm{PLT}}}({x}_{i}),$$

$${\varphi }_{{\rm{SEP}}}({x}_{i})=\sum _{g\ne {\rm{WT}}}{\theta }_{g}^{{\rm{SEP}}}{s}_{g}^{{\rm{SEP}}}({x}_{i}),$$

where ${s}_{g}^{{\rm{PLT}}}({x}_{i})$ and ${s}_{g}^{{\rm{PLT}}}({x}_{i})$ take value 1 if the genotype at the PLT or SEP loci match g and 0 otherwise. Note that φ_PLT takes a different value for every possible combination of mutations in PLT3 and PLT7 and φ_SEP takes a different value for every possible combination of mutations in EJ2 and J2, so that inferring the values for φ_PLT and φ_SEP is equivalent to allowing a full set of additive, dominance and pairwise interactions within each of PLT3/PLT7 and EJ2/J2. These two pairwise models are then combined through a multilinear function into the log-transformed average expected number of branching events μ(x_i), given by

$${\mu }_{{\rm{hierarchical}}}({x}_{i})={\theta }_{{\rm{WT}}}+{\varphi }_{{\rm{PLT}}}({x}_{i})+{\varphi }_{{\rm{SEP}}}({x}_{i})-{\theta }_{{\rm{Int}}}{\varphi }_{{\rm{PLT}}}({x}_{i}){\varphi }_{{\rm{SEP}}}({x}_{i}),$$

where θ_WT is the WT log-transformed expected branching events, φ_PLT controls the log effect of the relevant combination of mutations in PLT3 and PLT7 when placed in a WT EJ2 J2 background, φ_SEP controls the log effect of the relevant combination of mutations in EJ2 and J2 combinations in a WT PLT3 PLT7 background and θ_Int represents the masking interaction between the two phenotypes³⁶. Finally, as in the standard linear models from the previous section, the observed number of branching events y_i for a plant i with genotype x_i and t_i inflorescences is drawn from a negative binomial distribution with overdispersion parameter α:

$${y}_{i} \sim {\rm{Negative}}\,{\rm{Binomial}}({{t}_{i}\times e}^{{\mu }_{{\rm{hierarchical}}}({x}_{i})},\alpha ).$$

In summary, the hierarchical model is equivalent to fitting pairwise interactions within paralogue pairs, then combining the within pair effects through a multilinear interaction across pairs, and then transforming the result through an exponential function. Extended Data Fig. 4a shows a graphical representation of the complete model. This model was coded in PyTorch⁷⁰ and the maximum-likelihood solution was found running the Adam optimizer for 10,000 iterations and checking for convergence. Extended Data Fig. 4c shows the inferred model including all the EJ2^pro alleles and illustrates how the different layers of the hierarchical model are applied and combined together to predict the expected number of branching events for a given genotype.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Cryptic variation fuels plant phenotypic change through hierarchical epistasis

Motif enrichment and variant discovery

Plant materials

Genome editing

Growth conditions and phenotyping

Phylogenetic trees

RNA extraction and Illumina sequencing

RNA-seq analysis

Dual-luciferase assay

Statistics and reproducibility

Segregating populations

Linear regression models

Hierarchical model

Reporting summary

Cancer cells stay hidden using stolen mitochondria

My PI is not offering any support or guidance on my PhD project, what should I do?

How do vaccine cutbacks affect public health? Ask Japan

Most Popular

Anthropic appoints Irina Ghose, a former Microsoft India managing director, to lead its business in India, which has the second-largest user base for Claude...

Justin Herbert, Jaxson Dart’s Relationships Won’t Make It, Astrologer Claims

Luxembourg-based Hydrosat, which builds AI-based thermal infrared satellite tech to provide data for water resource management and more, raised a $60M Series B (Ingrid...

‘One Tree Hill’ Tyler Hilton Says He & Megan Park ‘Decided To End’ Marriage

Recent Comments

ABOUT US

POPULAR POSTS

Anthropic appoints Irina Ghose, a former Microsoft India managing director, to lead its business in India, which has the second-largest user base for Claude...

Justin Herbert, Jaxson Dart’s Relationships Won’t Make It, Astrologer Claims

Luxembourg-based Hydrosat, which builds AI-based thermal infrared satellite tech to provide data for water resource management and more, raised a $60M Series B (Ingrid...

POPULAR CATEGORY