Thursday, April 24, 2025
No menu items!
HomeNatureGenomic and genetic insights into Mendel’s pea genes

Genomic and genetic insights into Mendel’s pea genes

Plant materials and methods

Germplasm panel

A total of 697 accessions, maximizing genetic diversity, were selected from the JI Pisum Germplasm Collection for this study (Supplementary Table 1 and https://www.seedstor.ac.uk/). These germplasm accessions were introduced in 2019 to the Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, China, where they are grown and investigated annually.

DNA extraction for whole-genome resequencing

Genomic DNA was extracted from approximately 50 mg leaf tissue of 3-week-old seedlings. Extraction used the oKtopure system (LGC Biosearch Technology) following tissue desiccation with silica for 48 h. A bespoke protocol was used with the following volumes per sample: 250 µl lysis buffer, 170 µl binding buffer, 20 µl sbeadexTM suspension, 300 µl PN1 wash buffer, 300 µl PN2 wash buffer, 300 µl PN2 wash buffer (×3 wash cycles) and using 75 µl final elution buffer. For each accession, a minimum of 6 μg of genomic DNA was used to construct a 150-bp paired-end sequencing library with an insert size of 500 bp, following the manufacturer’s protocols (employing PCR-free methods), which was subsequently sequenced on the DNBSEQ Platform at BGI-Shenzhen resulting in ~80 Gb clean reads with a coverage of ~20× for each accession.

Phenotyping

DNA was extracted from a single plant whose seed was bulked up for progeny phenotyping. The diversity panel was planted in three different sites, Norwich, UK (52.62° N, 1.28° E), Shenzhen (Southern China, 22.61° N, 114.51° E) and Harbin (northern China, 45.86° N, 126.83° E). In China, four rounds of phenotyping were conducted. Specific subsets of accessions and some F2 populations were grown indoors in the greenhouse of Shenzhen Agricultural Field Farm, with 16 h of light/8 h of darkness. Phenotypes collected at the three stations (2020–2023) and a historical JIC phenotype dataset were curated in Seedstor (https://www.seedstor.ac.uk/). In Shenzhen, peas were planted in winter (October) and harvested in March the following year, whereas in Norwich and Harbin they were planted in spring (March to April) and harvested in August to October of the same year.

For the phenotyping of pod colour (green versus yellow-podded lines), a field trial of three 1 m2 microplots of 100 seeds each was sown in spring 2023, in which a 1:1 ratio of BC6 S3 Gp/Gp and gp/gp seeds (selfed seed of S2 homozygotes) were mixed and sown at random in each plot. At the pod filling stage, the Gp plants were tagged and at plot harvest seed was collected from individual plants to determine the yield of Gp and gp homozygotes. Seeds were weighed and counted on a Data Count R25+ machine (https://data-technologies.com/). Pod length and width were measured on 25 randomly selected pods. For the phenotyping of organ size, pod width (PW) and HSW were measured in mature pods of the F2 and F2:3 populations after harvest. In the F2 populations, PW was assessed using 15 representative pods, divided into 3 groups of 5, with the total width of each group measured sequentially. For the F2:3 populations, PW was determined using 5 representative pods, with their total width measured in a similar manner. HSW was calculated by randomly weighing 100 seeds from each accession and repeating the process 3 times to obtain an average weight for each accession. Other more specific phenotypes were collected as described in Supplementary Table 37 and in line with published descriptors (https://www.seedstor.ac.uk/search-phenotypes.php).

Construction of the pea genomic variation map

Read mapping, SNP calling and SNP annotation

The trimmed clean reads of each accession were aligned against the reference genome of pea (P. sativum) cultivar, ZW6 (ref. 18) and Caméor v.1.0 (ref. 17), using BWA-MEM (v.0.7.17) with default parameters63. Unmapped, non-unique and duplicated reads were filtered out using SAMtools64,65 (v.1.9) and Picard (v.2.20.3-SNAPSHOT) before variants were called by a standard pipeline of Genome Analysis Toolkit (GATK65 v.4.1.2) and Sentieon66 (v.202112.01). SNPs were further filtered to remove low-quality variants defined as (1) SNPs with more than two alleles; (2) SNPs with QD < 2.0, FS > 60.0, MQ < 40.0, SOR > 3.0, MQRankSum < −12.5, ReadPosRankSum < −8.0; (3) SNPs with observed heterozygosity (Hobs) exceeding the maximum calculated value (Hobs_max) based on the inbreeding coefficient (F), where F was calculated as 1 − (Hobs/Hexp), with Hexp defined as 2p(1 − p) using the frequency of the non-reference allele, and Hobs_max was determined as 10 × (1 − Fmedian) × Hexp for variants with F > 0 and minor allele frequency (MAF) > 0.05; (4) SNPs with missing rate >20% and MAF < 0.01. SnpEff67 (v.4.3t) was used to annotate the SNPs, and functional significance was then categorized on the basis of their positions with respect to genes (intergenic regions, exons, introns, splicing sites, untranslated regions, upstream and downstream regions) and mutation consequences (missense, start codon gain or loss, stop codon gain or loss and splicing mutations).

Identification of indels, gene PAV and gene CNVs, and SV

Small insertion–deletion mutants (indels; ≤50 bp) were called using GATK (v.4.1.2) and filtered following the criteria: QD < 2.0 || low_QD || FS > 200.0 || high_FS || ReadPosRankSum < −20.0 || low_ReadPosRankSum before they were annotated using SnpEff (v.4.3t). Read depth variation from read mapping analysis was used to identify gene presence and absence variation (PAV) and gene copy number variation (CNV) through normalization and correction in statistical analyses, following five steps: (1), mapped read depth at each gene was counted for each accession; (2), a correction for read depth variation (RDV) was applied, accounting for highly similar genes through all-versus-all coding sequence (CDS) alignment using BLASTN. Recently duplicated genes were collapsed into representative genes to minimize depth bias, which were further normalized by dividing the corrected read depth of the gene by the average sequencing depth of the accession; (3) the distribution of read depth versus GC content was used to correct read depth bias for each gene resulting from differential GC contents; (4), read depth variation was corrected for genomic regions with insertions or deletions in the genome reference; (5), subspecies-unique and shared CNVs were characterized by calculating the number of accessions with different copy numbers for each gene within each subspecies.

Different categories of structural variants (SVs: duplication, inversion, translocation and large-scale deletion or insertion) were detected on the basis of read mapping (read depth and read pair relationships) on PCR-duplicate-marked bam files using Delly (v 0.8.7) with default parameters; a summary of SVs identified is given in Supplementary Table 11.

Linkage disequilibrium analysis and pea haplotype map

A two-step LD pruning process was implemented to generate a high-quality core SNP dataset for the construction of a haplotype map68. Initially, SNPs were pruned on the basis of linkage disequilibrium (LD) using PLINK69, with a window size of 10 kb, a window step of one SNP, and an r2 threshold of 0.8. A second round of LD pruning was conducted with a window size of 50 kb, a window step of one SNP, and the same r2 threshold of 0.8. For population LD-based haplotype analysis, the filtered SNPs were phased using Beagle (v.21Apr21.304)70. Subsequently, haplotype blocks were delineated utilizing PLINK with specific parameters (–blocks no-pheno-req –blocks-max-kb 1000 –geno 0.1 –blocks-min-maf 0.05). To merge adjacent blocks maintaining significant LD, D’ statistic values were calculated between all SNPs of consecutive blocks. If the lower quartile (Q1) exceeded 0.98, the adjacent blocks were merged. After filtering for the inbreeding coefficient, HAPPE71 was employed to identify haplotype clusters (haplogroups) for each block.

Construction of mapping and validation populations

JI2822 × JI0816 F2 population

Lines JI0816 and JI2822 (Supplementary Table 18), both of short stature, are maintained in the John Innes Pisum germplasm collection (https://www.seedstor.ac.uk/). JI0816, also known as WBH 1185, has pink flowers, a fasciated stem and yellow pods lacking pod parchment, corresponding to the mutant alleles b, fa, gp and p, respectively. JI2822, a recombinant inbred line derived from the cross JI0015 × JI0399, is wild type at these four loci. JI0015 and JI0816 share the gp allele, indicating that these two lines had a common parent; therefore segments of the genetic map are devoid of segregating alleles. 1,000 F2 seeds from 9 F1 plants (JI2822 × JI0816) were sown at the JIC field station in spring 2022. DNA preps from 942 plants were prepared from individual leaflets using the Qiagen DNeasy protocol (https://www.qiagen.com). Of these, 405 were genotyped using an Axiom SNP array as described49. The phenotypic and genotypic data are available in Supplementary Tables 1820, and the sequences corresponding to the Axiom markers are available in supplementary table 3 of Ellis et al.49.

JI0015 × JI0399 and JI2822 × JI2833

Three populations have been used for mapping Gp. The first to be used was the previously described recombinant inbred population JI0015 × JI0399 (Supplementary Table 21), later genotyped by Neogen, using an Infinium array as described previously51. The second was an F2 population derived from a cross between two of these RILs (JI2822 Gp/Gp and JI2833 gp/gp), which was screened using PCR for markers already mapped in JI0015 × JI0399 in order to identify informative individuals (Supplementary Table 22). These, together with selected RILs with informative recombination events were genotyped with Axiom markers as described elsewhere49,65. Gp also segregates in the JI2822 × JI0816 F2 population as described above. The marker data are available in the supplementary file: Gp mapping in JI0015 × JI0399 (Supplementary Tables 21 and 22).

Other F2 mapping populations and BSA

We selected parental lines with contrasting pairs of traits to map genetic loci of interest in F2 populations using mapping by sequencing72 of bulked segregants. For genetic loci controlling uncharacterized Mendel traits: flower position (axial versus terminal), pod colour (yellow versus green), and pod shape (inflated versus constricted), crosses were made between Caméor (axial) × JI0814 (fasciated) and JI1995 (green pod) × JI2366 (yellow pod). F2 populations for the P/V loci (pod shape) were derived from the cross between JI0074 (P/P v/v) as the male parent and JI1995 (P/P V/V) as the female parent, and the cross between JI2822 × JI0816. F2 populations for the D locus (one (Dco) or two (Dw) axial rings of anthocyanin pigmentation) were derived from three crosses, with JI0191 (Dw), JI0794 (Dw) and JI1669 (Dw) as male parents and JI0328 (Dco) as the female parent. F2 populations for the Fn/Fna loci (flower number per node, fpn) were derived from four crosses, with JI0441 (1fpn), JI2410 (3fpn), JI0745 (2fpn) and JI0746 (3fpn) as male parents and JI1995 (2fpn) as the female parent. The marker and BSA analysis of the F2 population is as described36.

Approximately 300 plants from the F2 population of each of these crosses were planted in Shenzhen, China. Wild-type and mutant bulked DNA samples were prepared by mixing equal amounts of DNA from 30 accessions with the dominant and recessive phenotypes, respectively. DNA was isolated from fresh leaves using the CTAB method73. 50× depth genome sequences for each of the parents and the bulked samples were generated. Short reads were aligned against the ZW6 reference genome using BWA-MEM (v.0.7.17) and SNPs were identified using Samtools (v.1.9). The variation dataset was analysed using the G’s value method of the QTLseqr package (v.0.7.5.2).

Genetic mapping of Gp

Green versus yellow pod colour segregates in the recombinant inbred (RIL) population derived from the cross between JI0015 (gp/gp) and JI0399 (Gp/Gp). The JI0015 × JI0399 RIL population comprises 90 recombinant inbred lines, which, together with their parents, were genotyped using an Infinium array (Neogen) that detected 13,204 biallelic SNPs. This enabled us to position 5,209 PsCam markers on a genetic map (JI0015 × JI0399) and place Gp between the markers PsCam005046 and PsCam056084 (and their co-segregating markers). Additional mapping was undertaken, using an Axiom SNP array with 84,691 features49 of selected F2 progeny of a cross between JI2822 (Gp) and JI2833 (gp) together with RILs from JI0015 × JI0399 known to have recombination events at informative locations. JI2822 and JI2833 are both RILs from the JI0015 × JI0399 population. With respect to the ZW6 assembly18, this placed Gp between the Axiom markers AX-183865165 (chr. 2:320968993) and AX-183571028 (chr. 3:325580858) (JI0015 × JI0399). Analysis of an F2 population derived from crosses between JI2822 (Gp) and JI0816 (gp) placed Gp between the Axiom markers AX-183571050 (chr. 3:321020350) and AX-183879077, (chr. 3:324762848; Supplementary Table 18).

We performed different association genomics analysis for pod colours, including the SNP-based GWAS, LD-based haplotype GWAS, kmer-derived IBS-based haplotype GWAS, and the SV-based GWAS (Supplementary Fig. 14), all resulting in consistent and significant single GWAS peaks for pod colour located in the expected position of Gp, as seen in Manhattan plots (Supplementary Fig. 14).

Allelism tests for gp

Crosses were made between pairs of yellow-podded lines in the JIC germplasm collection (Supplementary Table 18). Seed and vegetative phenotypes were used to identify F1 progeny plants, and those accessions allelic, or non-allelic, to gp were identified by their yellow or green pod colour, respectively.

Near-isogenic lines for Gp versus gp

The JI0015 gp allele was introgressed into the Caméor background by sequential back-crossing and F1 progeny testing using a codominant PCR marker assay with one forward (25994_F) and two reverse (25994_15R and 25994_399R) primers (Supplementary Table 18). Gp (596 bp) and gp (688 bp) alleles were distinguished in a 35 cycle, 10s–30s–60s Touchdown PCR reaction that reduces the initial 62 °C annealing temperature to 50 °C in the first 10 cycles.

Marker development and QTL mapping for PsOs1

The organ size-related quantitative trait locus (PsOs1) was fine-mapped using 21 Kompetitive Allele Specific PCR (KASP) markers for SNPs distinguishing accessions JI0074 and JI1995 after whole-genome resequencing in the candidate region. Each KASP marker was designed with two allele-specific forward primers (Supplementary Table 47) and one common reverse primer, on the basis of 200-bp sequences upstream and downstream of target genic SNPs, following the standards of LGC Biosearch Technologies. The genetic linkage map was constructed using JoinMap v.4.0 software. Windows QTL Cartographer v.2.5 software facilitated inclusive composite interval mapping (ICIM) for identifying and analysing QTLs. A logarithm of odds (LOD) score of ≥3.0 was deemed indicative of a QTL.

Genome-wide association study

The multi-location and multi-season phenotypic dataset was used to perform GWASs with the SNP matrix using GEMMA (v.0.98.1)74, and employing the following parameters (gemma-0.98.1-linux-static -miss 0.9 –gk -o kinship.txt and gemma-0.98.1-linux-static -miss 0.9 -lmm -k kinship.txt). The structural variation matrix was used to test for association with phenotypic variation for each of the selected traits using the same parameters as above. The haplotype map was used to test for association with phenotypic variation for each of the selected traits using RTM-GWAS75 with the following parameters (rtm-gwas-gsc –vcf in.vcf –out out.matrix and rtm-gwas-assoc –vcf in.vcf –covar out.matrix.evec –no-gxe). We used GEMMA’s Wald tests and directly visualized the resulting P values as –log10-transformed values in Manhattan plots. In-house R scripts were employed for data plotting. To identify SNPs of interest, we applied a Bonferroni correction to the significance threshold (α = 0.1) based on the total number of tested variants (9,214,461 SNPs). This yielded a threshold of –log10(0.1/9,214,461), and any SNP surpassing this cutoff was considered noteworthy and highlighted in the Manhattan plots.

Orthologues and gene family analysis

Phylogenetic analyses were conducted on key gene families in this study, such as MYB, CLE and CIK/SERK, following a consistent workflow. Relevant A. thaliana orthologous genes containing the required domains were retrieved from TAIR (https://www.arabidopsis.org), and profile hidden Markov models (HMMs) were constructed using HMMER (v.3.1b1) on the basis of multiple sequence alignments generated by MAFFT (v.7.475). These HMMs were then employed to identify putative homologues in the pea (P. sativum) ZW6 genome. Multiple sequence alignments for each family were trimmed with trimAl (v.1.5.rev.0), and the best-fit amino acid substitution models were selected using ModelTest-NG (v.0.1.7). Phylogenetic trees were constructed by IQ-TREE (v.2.1.2) with 1,000 ultrafast bootstrap replicates. For synteny analysis for each gene family, we used OrthoFinder (v.2.5.4) to identify orthologous clusters among pea and related legumes (for example, Vicia sativa, Medicago truncatula, Cicer arietinum, Lotus japonicus, Vigna radiata, Phaseolus vulgaris and Glycine max), and visualized collinearity blocks with JCVI (v.1.2.7).

Gene functional validation experiments

Fast neutron mutants

Several Fast Neutron mutants from a population described by Domoney et al.76, were included in this project. These were: FN1453/1sil-like; FN1091/4 lacking axil ring pigmentation, allelic to d; FN1218/6 lacking axil ring pigmentation, allelic to d; FN2026/7coch2 candidate; FN2073/5 lacking axil ring pigmentation, not allelic to d; and FN2076/5VicA FN deletion line.

Crosses were made between pairs of lines lacking axil ring pigmentation (Supplementary Fig. 32) to test for complementation. Where possible, vegetative phenotypes were used to identify F1 progeny plants, and those accessions allelic, or non-allelic, to d were identified by the absence, or presence of pigmented axil rings, respectively.

Complementation test

A reverse genetics screen for the ChlG gene in pea was carried out in an ethane methane sulfonate-mutagenised targeting induced local lesions in genomes (TILLING) population in background Caméor26. Line 411.1, with a G>A mutation 1,900 bp after the ATG, resulting in a W121* nonsense mutation, was identified. Eight M4 seeds were sown and seedlings were genotyped with a cleaved amplified polymorphic sequence (CAPS) marker (Supplementary Table 47). No seedlings were homozygous mutants (signified by a single undigested 1,125-bp band), 6 were heterozygous (signified by 3 bands of sizes 1,125 bp, 699 bp and 426 bp) and 2 were homozygous wild type (signified by 2 bands of sizes 699 bp and 426 bp). A complementation test was carried out by crossing heterozygous seedlings with a homozygous JI0015 gp mutant (13 crosses with male JI0015 and F1 identified by long internodes, and 3 crosses with female JI0015 and F1 identified by yellow cotyledons). Nine out of 16 F1 progeny plants had yellow pods, indicating non-complementation.

Virus-induced gene silencing

VIGS in peas was performed using a published methodology as described previously77. To target genes of interest, a 200–500 bp fragment from the CDS region of each gene were amplified. The primers for VIGS constructs, including VIGS-PsChlG, VIGS-PsOs1, VIGS-PsMYB26 and PsMYB16, are provided in Supplementary Table 47. SpeI/XbaI and EcoRI were used to linearize the pCAPE2 vector, which was kindly provided by Li et al.78, and corresponding fragments of gene targets were ligated into the vector to construct the vectors for VIGS assays. For VIGS-PsChlG, the negative control vector, pCAPE2-Con, was constructed in the same way by replacing the PsChlG fragment in pCAPE2-PsChlG with a 529-bp insert derived from a cDNA fragment of Bean Yellow Mosaic Virus (GenBank accession AJ622899). The positive control vector, pCAPE2-PDS, targeting the phytoene desaturase gene, was also provided by Li et al.78. These vectors were transferred into Agrobacterium tumefaciens (GV3101) (Shanghai Weidi Biotechnology) and VIGS assays carried out following the protocol described by Constantin et al.79. In brief, Agrobacterium strains carrying these vectors were shaken separately until OD600 = 1.2, followed by the collection and resuspension of the bacteria in injection buffer (NaCl: 10 mM, CaCl2: 10 mM, acetosyringone: 0.1 mM) to a concentration of OD600 = 1.2. After resting for 2–3 h, the solution of pCAPE2-target gene, pCAPE2-PDS (positive control), and pCAPE2-Con (negative control) was mixed with pCAPE1, separately, in equal proportions, and injected into 10-day-old compound leaves of the acceptant lines (Yunnan2070 or JI1995). Specifically, pCAPE1 and pCAPE2 are plasmid vectors used to induce gene silencing in plants such as M. truncatula and P. sativum. After 24 h of darkness, they were transferred to long day conditions. New leaves of positive control plants bleached in about 10 days, indicating successful silencing of PDS. VIGS was employed with the same procedure for PsMYB16 gene within the D locus and VIGS-MYB26 for the V candidate gene. For PsOs1, which is described in detail below, all the gene-specific primers used for VIGS constructs are listed in Supplementary Table 47.

Transformation, gene overexpression and silencing of PsOs1

The PsOs1 coding sequence of JI0074 was amplified (primers listed in Supplementary Table 47) and integrated into the pCAMBIA1305 vector, resulting in the pCAMBIA1305-PsOs1JI0074 construct. The plasmid was then introduced into A. tumefaciens GV3101, which was subsequently employed to transform A. thaliana (Col-0) via the floral dip technique. T3 generation homozygous transgenic Arabidopsis lines were selected for measurement of thousand-seed weight and the dimensions of elongated siliques.

GUS staining, GFP fluorescence observations and flow cytometry

The pCAMBIA1305-PsOs1JI1995 vector was constructed using the same methodology, with primers detailed in Supplementary Table 47. Both vectors, pCAMBIA1305-PsOs1JI1995 and pCAMBIA1305-PsOs1JI0074, were introduced into the A. tumefaciens strain GV3101. In these experiments, H2B-mCherry served as a nucleus marker. The agrobacteria were resuspended and infiltrated into Nicotiana benthamiana leaf epidermal cells using an infiltration buffer consisting of 10 mM MES (pH 5.6), 10 mM MgCl2, and 150 μM acetosyringone, at an OD600 of 0.8. Fluorescence was observed 48 h after infiltration using a confocal laser-scanning microscope.

To compare the promoter activities of JI0074 and JI1995 alleles of PsOs1, we cloned sequences 3,000-bp upstream of the coding region and inserted them into pCAMBIA1300-GUS, resulting in the constructs ProJI0074-GUS and ProJI1995-GUS. These were expressed in Nicotiana tabacum leaves and subsequently stained using a GUS Staining Kit (Coolaber Biotech). GUS activity was quantified using the GUS Gene Quantitative Detection Kit (Coolaber Biotech). For a detailed examination of PsOs1 expression patterns in Arabidopsis, various Arabidopsis tissues were sampled from ProJI0074-GUS transgenic plants. After ethanol decolourization, observations and photographs were taken under a microscope. Details of the primers used are provided in Supplementary Table 47.

Intact nuclei from pea pods were isolated using LB01 lysis buffer (Coolaber Biotech), followed by RNA removal and subsequent propidium iodide staining. The nuclei were then quantified using a CytoFLEX flow cytometer. A minimum of 20,000 nuclei were counted for each sample, and each experiment was replicated at least three times. Data analysis was conducted using FLOWJO software, and representative images were presented. The endoreduplication index (EI) was calculated using the formula: EI = [(0 × percentage of 2C nuclei) + (1 × percentage of 4C nuclei) + (2 × percentage of 8C muclei) + (3 × percentage of 16C nuclei) + (4 × percentage of 32C nuclei)]/100.

Yeast two-hybrid experiment

Yeast two-hybrid assays were conducted according to the protocols outlined in the Yeast Protocols Handbook (Clontech). The CDS of PsCIK2/3 was cloned into the bait plasmid pBT3-SUC, while the CDS of PsCLV1 or PsCLV2 was cloned into the prey plasmid pPR3-N. The primer sequences used for cloning are provided in Supplementary Table 47. These plasmids were co-transformed into the yeast strain NMY51 in different combinations. Transformants were initially screened on SD/-Trp/-Leu medium to confirm successful co-transformation. Interaction assays were then performed on SD/-Trp/-Leu/-His/-Ade medium containing the chromogenic substrate X-α-Gal at 30 °C to detect protein-protein interactions.

Anatomical studies and TEM

Confocal images were collected with Leica TCS SP8 confocal laser-scanning microscope (Leica). After sampling, the shoot apices of Caméor and fa mutant line JI0814, the young leaves (gp/gp, JI2366; Gp/Gp JI0817), and the pod walls of JI0074 and JI1995 were immediately preserved in formaldehyde/alcohol/acetic acid fixative. Paraffin sectioning was performed following established methodologies. Staining was conducted using safranin and fast green (JI0074 and JI1995) and toluidine blue (Caméor and JI0814). Prepared slides were scanned using a NanoZoomer, and cell quantification was carried out using NDP.view2 software. For pods, the resin block was sliced at 60–80 nm on an ultrathin slicer, and the slices were picked up on a 150-mesh copper mesh. The copper mesh was stained with a 2% uranyl acetate saturated alcohol solution in the dark for 8 min; washed 3 times with 70% alcohol; washed 3 times with ultrapure water; stained with a 2.6% lead citrate solution in the dark for 8 min; washed 3 times with ultrapure water, and slightly dried with filter paper. The copper mesh sections were placed in a copper mesh box and dried overnight at room temperature. The observation was under a transmission electron microscope and images collected for analysis.

For TEM studies, pea leaflets and pods (18 days after flowering) were removed from BC3 S2 gp/gp and Gp/Gp plants, after 9 h of daylight. Tissue (1 mm2) pieces were placed in a solution of 2.5% (v/v) glutaraldehyde in 0.05 M sodium cacodylate, pH 7.3 for fixation. Samples were left overnight at room temperature and then processed for embedding (Leica EM TP embedding machine) by washing out the fixative with three successive 15-min washes in 0.05 M sodium cacodylate, followed by fixation in 1% (w/v) OsO4 in 0.05 M sodium cacodylate for 2 h at room temperature. After three 15 min washes in distilled water, samples were dehydrated in an ethanol series (30%, 50%, 70%, 95% and two changes of 100% ethanol), then infiltrated with LR White resin (London Resin Company) by successive changes of resin:ethanol mixes at room temperature (1:1 for 1 h, 2:1 for 1 h, 3:1 for 1 h, 100% resin for 1 h, then 100% resin for 16 h, and 100% resin for a further 8 h). Samples were polymerized in LR White resin at 60 °C for 16 h, then sectioned with a diamond knife (Leica UC7 ultramicrotome). Ultrathin sections (approximately 90 nm) were placed on 200 mesh formvar and carbon-coated copper grids (Agar Scientific). Sections were stained with 2% (w/v) uranyl acetate for 1 h and 1% (w/v) lead citrate for 1 min, washed in distilled water and air dried. Grids were viewed in a FEI Talos 200 C transmission electron microscope (FEI) at 200 kV and imaged using a Gatan OneView 4K × 4K digital camera (Gatan) to record DM4 files.

In situ hybridization

Tissues were rinsed with PBS and immediately placed in the in situ hybridization fixative solution for more than 12 h. Paraffin section preparation and in situ hybridization of the probes were performed according to standard protocols80. The sequences of the digoxigenin (DIG)-labelled antisense riboprobes used for in situ hybridization are provided in Supplementary Table 47. The hybridization signal was detected with an alkaline phosphatase-conjugated anti-DIG antibody (200-052-156, Jackson ImmunoResearch). Finally, images were obtained and analysed using a Pannoramic MIDI digital slide scanner (3DHISTECH).

RNA-seq, iso-seq and gene expression

RNA extraction and pea transcriptome

We built a transcriptome atlas from the reference line Caméor (Supplementary Table 30), and selected various accessions that display the contrasting pairs of traits studied here (Supplementary Table 31). In China, plant tissues (seed, root, nodule, leaflet, stem, flower, pod, stipule, tendril and apical bud) at different development stages (seedling, flowering and podding) were collected and fixed in Trizol before RNA extraction. Tissues were ground in liquid nitrogen and the FastPure Universal Plant Total RNA Isolation Kit (Vazyme Biotech) was used to extract total RNA, the quality of which was assessed by gel electrophoresis. For each sample, we performed short-read RNA-sequencing using the DNBSEQ Platform at BGI Group to generate 6–8 Gb raw RNA reads for each accession.

At JIC, RNA was prepared from young developing pods (flat pod stage, ~60–70 mm in length) of each of the parental and RI lines derived from the cross between JI0015 (gp/gp) and JI0399 (Gp/Gp). Developing seeds were removed from the pods, which were then rapidly frozen in liquid nitrogen. High-quality RNA lacking genomic DNA was extracted from 97 individual pod samples, using a Spectrum Plant Total RNA Kit (Sigma-Aldrich), and used for PCR with reverse transcription and RNA-seq experiments focussed on the identification and characterisation of gene candidates for gp. For the latter analysis, green-podded and yellow-podded RILs (95 in total) were assigned to three groups for each phenotype, ensuring that lines with contrasting plant phenotypes (e.g. plant height) were randomly distributed among the replicate groups (G1, G2 and G3 for green-podded RILs; Y1, Y2 and Y3 for yellow-podded RILs, with 15–17 RILs per pool). Equal amounts of RNA from every line within a group were pooled. RNA-seq (Illumina HiSeq4000) and initial bioinformatic analyses were carried out by the Earlham Institute.

We performed Iso-seq sequencing for a subset of accessions for the target organ at specific developmental stages. We used the Iso-Seq (v.4.0.0) pipeline to process PacBio SMRT Cell subreads and generate high-quality, full-length transcripts. First, subreads from each SMRT Cell were processed with ccs (v.3.4.1) to produce one circular consensus sequence per zero-mode waveguide, applying a minimum read quality of 0.9. Next, primer removal and demultiplexing were performed with lima (v.2.9.0) in Iso-Seq mode, removing unwanted primer combinations and orienting reads from 5′ to 3′. Full-length reads were then refined by trimming poly(A) tails and removing concatemers. When multiple SMRT cells were sequenced, the resulting full-length non-concatemer (FLNC) BAM files were merged before clustering. Iso-Seq cluster (v.4.0.0) was applied to produce polished consensus transcripts, partitioned into high-quality (HQ) and low-quality (LQ) sets on the basis of predicted accuracy. The final consensus transcripts were aligned to the ZW6 and JI0074 reference genomes using pbmm2 (v.1.14.99). Last, iso-seq collapse was used to collapse redundant isoforms and generate GFF files, which were converted to GFF3 with gffread (0.12.7) for downstream analyses.

RT–qPCR

RT–qPCR was conducted to analyse gene expression. Total RNA was extracted using the FastPure Plant Total RNA Isolation Kit (Vazyme Biotech) following the manufacturer’s instructions, including an on-column DNase I digestion step to remove genomic DNA. Subsequently, 1 μg of RNA was used for cDNA synthesis with the All-in-One First-Strand cDNA Synthesis SuperMix for qPCR kit (TransGen Biotech). Green qPCR SuperMix kit (TransGen Biotech) was used for amplification on a CFX384TM Real-Time System (Bio-Rad). The method of collecting plant material for the detection of P/PsCLE41 and V/PsMYB26 was as follows: (1) pods were collected 10 days after flowering. After removing the seeds, the pods were cut into pieces and frozen in liquid nitrogen; (2) apical buds were collected 14 days after emergence and, after removing the extra young leaves under a microscope, the buds were cut into pieces and frozen in liquid nitrogen; (3) stems were collected 14 days after emergence at the third node from the top of the stem, cut into pieces and frozen in liquid nitrogen; (4) seed cotyledons were collected 12 days after flowering; after removing the testa, the cotyledons were cut into pieces and frozen in liquid nitrogen.

To validate the Gp transcriptional fusion and aberrant transcripts, the total RNA was reverse transcribed to cDNA using HiScript III First Strand cDNA Synthesis Kit (+gDNA wiper, Vazyme Biotech). RT–qPCR analysis was conducted using Taq Pro Universal SYBR qPCR Master Mix (Vazyme Biotech), employing specific primers, with PsACTIN serving as the internal standard. Expression levels of genes were quantified relative to the control based using the 2−ΔΔCT method. All RT–qPCR results represent the mean ± s.d. from three separate biological experiments. The primers used for RT–qPCR primers used are provided in Supplementary Table 47.

DNA methylation sequencing

Bisulfite treatment and libraries were prepared accordingly to the standard protocol of methylation library construction. PE150 sequencing was performed using Illumina NovaSeq X Plus sequencing platform 25B chip. Quality control and adapter trimming of the raw whole-genome bisulfite sequencing (WGBS) data was then performed using Trimmomatic (v.0.39). The resulting WGBS reads were mapped to the JI0074 reference genome using Bismark (v.0.23.0) and PCR duplicates were removed from the aligned reads.

Statistical methods

General statistical analysis

Statistical analyses were conducted in R software suite (v.4.2; https://www.r-project.org/) unless otherwise stated. The correlation between different traits was tested by calculating the coefficients of Pearson correlation, as well as the P values, using the cor.test package, with the method set to ‘Pearson’ for the correlation analyses between quantitative traits. Traits collected at different locations and in different years were analysed by calculating their rank correlations by setting the option ‘method’ to ‘Spearman’. The correlation between qualitative traits was assessed using the chi-square test using the ‘chisq.test’ package in R. Gene expression levels in different lines or organs under different treatments or at different developmental stages (at least three biological replicates for each sample) were analysed using DESeq2 (ref. 81), in which the genes with a false discovery rate (Bonferroni) lower than 0.01 were defined as significantly regulated genes, unless there is an alternative explanation in a specific legend. The min–max scaling (normalization) approach was used to calculate the expression level for comparison of each gene across stages and organs by the formula: Xscaled = (X − Xmin)/(Xmax − Xmin).

PCA (main text Fig. 1) was performed on the PLINK distance matrix using an Excel add-in downloaded from RIKEN, now available at https://systemsomicslab.github.io/compms/others/main.html#Statistics.

Statistics and reproducibility

All experiments were designed with explicit consideration of statistical power and reproducibility. Each experiment was independently repeated at least three times (biological replicates) with consistent results across repetitions, including microscopy analyses used at least three samples or technical replicates per experimental condition (such as Fig. 3b,c,g and Extended Data Figs. 4a–c,e and 6b,h,i); phenotypic observations were validated across ≥3 independent growth cycles (Norwich, Shenzhen, Harbin, multiple sites and multiple years). Micrographs shown are representative of at least three independent experiments. Complete protocols and raw data supporting the conclusions are available in the Data availability and Supplementary Information. Data are presented as mean ± s.e.m., and statistical significance was determined using two-tailed Students’ t-tests (‘t.test’ package in R software v.4.2) in the analyses of the phenotypes, such as seed weight and pod width, between different accessions.

Population structure analysis

The core high-quality SNP dataset was used for population structural analyses. PCA and t-distributed stochastic neighbour embedding analyses were first performed using beta Python modules sklearn.decomposition and sklearn.manifold. ADMIXTURE82 (version 1.3.0) was employed to analyse the population structure, with K increasing from 2 to 16. The Q value represents the estimated proportion of an individual’s genome that originates from each inferred ancestral population cluster.

Genetic differentiation (Fst) and nucleotide diversity (π) were calculated with VCFtools (version 0.1.13). Fst scores were calculated within nonoverlapping 100-kb windows and π was calculated for each individual site and averaged across the genome for each group. LD was calculated on SNP pairs within a 500-kb window using PopLDdecay83 (version 3.31; https://github.com/BGI-shenzhen/PopLDdecay) and the decay was measured by the distance at which the Pearson’s correlation efficient (r2) dropped to half of the maximum. Splits tree analysis of the PLINK distance matrix was performed using SplitsTree4 (ref. 61).

Germplasm availability

All the germplasm described and used in this work is available to order from the John Innes Centre Germplasm Resources Unit (https://www.seedstor.ac.uk/). The 697 sequenced single seed descent lines derived from John Innes Pisum Germplasm accessions, were imported in 2019 to the Agricultural Genomics Institute at Shenzhen (AGIS), Chinese Academy of Agricultural Sciences (CAAS), where they have been grown and phenotyped annually since then.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

RELATED ARTICLES

Most Popular

Recent Comments