Friday, January 10, 2025
No menu items!
HomeNatureA rare PRIMER cell state in plant immunity

A rare PRIMER cell state in plant immunity

Reagents and kits

The following reagents and kits were used: ammonium persulfate (Sigma, 09913); BSA solution (Sigma, A1595); Chromium Next GEM Single Cell Multiome ATAC + Gene Expression kits (10x Genomics, PN-1000283); Corning Falcon cell strainers (Corning, 08-771-2); dithiothreitol (Thermo, R0861); EDTA, pH 8.0 RNase-free (Invitrogen, AM9260G); KCl (2 M) RNase-free (Invitrogen, AM9640G); MACS SmartStrainers (Milteny Biotec, 130-098-458); MERSCOPE 500 Gene Imaging kits (Vizgen, 10400006); MERSCOPE 500 Gene Panel (Vizgen, 10400003); MERSCOPE Sample Prep kits (Vizgen, 10400012); N,N,N′,N′-tetramethylethylenediamine (Sigma, T7024-25ML); NaCl (5 M) RNase-free (Invitrogen, AM9759); NP40 (IGEPAL CA-630) (Sigma, I8896); paraformaldehyde (Sigma, F8775); PBS (10×) pH 7.4 RNase-free (Thermo Fisher, AM9625); protease inhibitor cocktail (Sigma, P9599); Protector RNase inhibitor (Sigma, 3335402001); Scigen Tissue-Plus OCT compound (Fisher, 23-730-571); spermidine (Sigma, S2626); spermine (Sigma, 85590); Triton-X (Sigma, 93443); and UltraPure 1 M Tris-HCI buffer pH 7.5 (Invitrogen, 15567027).

Gene symbols, names and ordered locus names

The following genes were highlighted in this study: ALD1 (AGD2-LIKE DEFENCE RESPONSE PROTEIN1, At2g13810); BCA2 (BETA CARBONIC ANHYDRASE 2; At5g14740); BON3 (BONZAI3; At1g08860); CBP60g (CALMODULIN BINDING PROTEIN 60-LIKEG; At5g26920); FDH (FIDDLEHEAD; At2g26250); FMO1 (FLAVIN-DEPENDENT MONOOXYGENASE1; At1g19250); GT-3A (TRIHELIX DNA-BINDING FACTOR (GT FACTOR) BINDING3A; At5g01380); ICS1 (ISOCHORISMATE SYNTHASE1; At1g74710); LSD1 (LESION SIMULATING DISEASE1; At4g20380); HSFB2b (HEAT SHOCK TRANSCRIPTION FACTOR B2b; At4g11660); ILL6 (IAA-LEUCINE RESISTANT (ILR)-LIKE GENE 6; At1g44350); MAM1 (METHYLTHIOALKYLMALATE SYNTHASE1; At5g23010); WRKY8 (At5g46350); and WRKY46 (At2g46400).

Plant growth and bacterial infection for single-cell and spatial analyses

A.thaliana Col-0 was grown in a chamber at 22 °C with a 12-h light period and 60–70% relative humidity for 30–31 days. Bacterial strains were cultured in King’s B liquid medium with antibiotics (rifampicin and tetracycline) at 28 °C. Three bacterial strains—P.syringae pv. tomato DC3000 with an empty vector (pLAFR3), avrRpt2 (pLAFR3) and avrRpm1 (pLAFR3)—have been previously described9,10,11. Bacteria were collected by centrifugation and resuspended in sterile water to an OD600 of 0.001 (approximately 5 × 105 c.f.u. ml−1). In total, 20 A. thaliana leaves (4 fully expanded leaves per plant) were syringe-inoculated with bacterial suspensions using a needleless syringe. Syringe infiltration was performed on one or two corners of each leaf, and bacterial suspensions were spread throughout the entire leaf. We chose four different time points (4, 6, 9 and 24 h), representing early stages of infection and when dynamic transcriptional reprogramming was observed in a previous study that used bulk RNA-seq40. For each strain, four time points were sampled at the same time of day by infiltrating bacteria at different times to minimize the influence of circadian rhythms. The 20 infected leaves were collected using forceps and immediately processed for extraction of nuclei. For the mock condition, water-infiltrated leaves were collected after 9 h.

Generation of transgenic plants and pathogen-infection assays

pWAT206 was a gift from A. Takeda. The following plasmids were constructed using HiFi DNA assembly kits (New England Biolabs) and were verified by Sanger sequencing. Three PCR fragments were amplified from pBICRMsG41 using primer pairs (mEGFP-Nter_1_F plus mEGFP_1_R; mEGFP_2_F plus mEGFP_2_R; and mEGFP_3_F plus mEGFPNter_3_R) and were then assembled into StuI/AscI-digested pBICAscII42. The resulting plasmid was used as a template for two PCR reactions using primer pairs mEGFP_Nter_1_F plus mEGFP_4R and mEGFP_4F plus mEGFP_Nter_3R. These PCR products were assembled into StuI/AscI-digested pBICAscII, which produced pBIC_mEGFP_Nter. A PCR fragment was amplified from pBIC_mEGFP_Nter using mEGFP_Cter_F1 plus mEGFP_Cter_R, and then used as a template for PCR using mEGFP_Cter_F2 plus mEGFP_Cter_R. The resulting PCR fragment was assembled into StuI/AscI-digested pBIC_mEGFP_Nter to obtain pBIC_mEGFP_Cter. The coding sequence of GT-3A (At5g1380) was amplified by PCR using the primer pair GT3a_mEGFP_F plus GT3a_mEGFP_R and was assembled into StuI-digested pBIC_mEGFP_Cter to obtain pBIC_GT3a_mEGFP. The sequence encoding carboxy-terminally mEGFP-fused GT-3A was amplified from pBIC_GT3a_mEGFP by PCR using a primer pair (35Sp_HiFi_F and 35Ster_HiFi_R) and was then assembled into XhoI/XbaI-digested pWAT206 to obtain pWAT206_GT3a_mEGFP. A.thaliana Col-0 plants were transformed using Agrobacterium tumefaciens GV3101 (pMP90, pSoup) with pWAT206_GT3a_mEGFP as previously described43. Two independent Basta-resistant transgenic lines were used for bacterial growth assays with a bioluminescent P.syringae pv. tomato DC300043 strain and for lesion development analysis using C.higginsianum MAFF 305635. Plants were grown in a climatic chamber with a temperature of 22 °C, 60% relative humidity and light intensity of 6,000 lux (about 100 μmol m–2 s–1) for 10 h. Bacterial suspensions at OD600 = 0.001 in sterile water were syringe-infiltrated into leaves of 4–5-week-old plants. Bacterial growth was measured as bioluminescence using a GloMax Navigator Microplate luminometer (Promega) as previously described44. For lesion development analysis, leaves of 4–5-week-old plants were drop-inoculated with a 5 µl conidial suspension of C.higginsianum (1 × 105 conidia per ml). The inoculated plants were kept under high humidity in a climatic chamber with a temperature of 22 °C and light intensity of 6,000 lux for 10 h. Lesion size was measured at 6 days after inoculation using ImageJ.

Generation of gt3a-KO plants

The gt3a-KO plants were created by genome editing with CRISPR–Cpf1. The plasmid pWAT235-Cas9-HF-cc was provided by A. Takeda. The Cpf1 sequence was amplified from pY004 (Addgene, 69976) by PCR using a primer pair (SpeI-NLS-Cpf1-F and BamHI-NLS-Cpf1-R; Supplementary Table 2), followed by digestion with SpeI and BamHI. The digested DNA fragment was ligated to pWAT235-Cas9-HF-cc, which had been digested with SpeI and BamHI, resulting in pWAT235-Cpf1-cc. Two PCR fragments were amplified from pWAT235-Cpf1-cc using primer pairs Gb-ccdB-F and Gb-ccdB-R, and Gb-U626-F and Gb-U626-R. These PCR fragments and BsaI-digested pWAT235-Cas9-HF-cc were assembled into a single plasmid using a HiFi DNA assembly kit, which resulted in pWAT235-Cpf1-U626-cc-polyT. Two oligonucleotides, GT3a_gRNA2_F and GT3a_gRNA2_R, were hybridized and ligated to BsaI-digested pWAT235-Cpf1-U626-cc-polyT, which resulted in pWAT235-Cpf1-U626-GT3a-gRNA2-polyT. A.thaliana Col-0 plants were transformed using A.tumefaciens GV3101 (pMP90, pSoup) with pWAT235-Cpf1-U626-cc-polyT as previously described43. A transgenic line with a premature stop codon due to a 5-bp deletion in the GT-3A gene was selected and used throughout this study.

Bulk RNA-seq of plants

Leaves of wild-type plants and one of the transgenic lines expressing GT-3A–mEGFP were syringe-infiltrated with water or DC3000 at an OD600 of 0.001 and were collected 24 h after infiltration. Total RNA was extracted using TRI Reagent (Sigma). One microgram of total RNA was used for library preparation using the BrAD-seq method to create strand-specific 3′ digital gene-expression libraries45. The libraries were sequenced on a DNBSEQ-G400 platform at BGI, which produced 100-bp end reads. Because the quality of reverse reads was poor due to the poly(A) sequence, only forward reads were used for analysis. Trimming of the first 8 bases and adaptors and quality filtering were performed using fastp (v.0.19.7)46 with the parameters -x -f 8 -q 30 -b 50. The trimmed and quality-filtered reads were mapped to the Arabidopsis genome (TAIR10) using STAR (v.2.6.1b)47 with default parameters and transformed to a count per gene per library using featureCounts (v.1.6.0)48. Statistical analysis of the RNA-seq data was performed in the R environment (v.4.1.3). Because the BrAD-seq method involves poly(A) enrichment, mitochondrial and chloroplast genes were excluded. Genes with mean read counts of fewer than ten per library were excluded from the analysis. The resulting count data were subjected to TMM normalization using the function calcNormFactors in the package edgeR, followed by log transformation by the function voomWithQualityWeights in the package limma. To each gene, a linear model was fit using the function lmFit in the limma package. For variance shrinkage in the calculation of P values, the eBayes function in the limma package was used. The resulting P values were then corrected for multiple hypothesis testing by calculating Storey’s q values using the function qvalue in the package qvalue. To extract genes with significant changes in expression, cut-off values of q < 0.05 and |log2-transformed fold change| > 1 were applied.

Extraction of nuclei and single-nucleus sequencing

Fresh nucleus purification buffer (NPB; 15 mM Tris pH 7.5, 2 mM EDTA, 80 mM KCl, 20 mM NaCl, 0.5 mM spermidine, 0.2 mM spermine, 1:100 BSA and 1:100 protease inhibitor cocktail) was prepared before the experiment and chilled on ice. All the subsequent procedures were performed on ice or at 4 °C. Twenty leaves were chopped in 500–1,000 µl cold NPB with 1:500 Protector RNase IN with a razor blade on ice for 5 min to release nuclei and then incubated in 20 ml NPB. The crude extract of nuclei was sequentially filtered through 70-µm and 30-µm cell strainers (70 µm, Corning Falcon cell strainers, Corning, 08-771-2; 30 µm, MACS SmartStrainers, Milteni Biotec, 130-098-458). Triton-X and NP40 were added to the extract to a final concentration of 0.1% each, and the extract was incubated at 4 °C for 5 min with rotation. The suspension was centrifuged at 50g for 3 min in a swing-rotor centrifuge to pellet non-nucleus debris and the supernatant was recovered. The nuclei were pelleted by centrifugation at 500g for 5 min in a swing-rotor centrifuge. When the pellet was green, the pellet was resuspended in 20 ml NPBd (NPB with 0.1% Triton-X and 0.1% NP40) with 1:1,000 Protector RNase IN by pipetting, followed by centrifugation at 500g for 5 min. When the pellet was translucent, the NPBd wash was skipped. The pellet was then washed by resuspending it in 20 ml NPB with 1:1,000 Protector RNase IN and centrifuging at 500g for 5 min in a swing-rotor centrifuge. The pellet was resuspended in 950 µl of 1× Nuclei Buffer (10x Genomics, PN-2000207) with 1:40 Protector RNase IN and 1 mM dithiothreitol. The suspension of nuclei was centrifuged at 50g for 3 min in a swing-rotor centrifuge to pellet non-nucleus debris and the supernatant was recovered. This step was repeated one more time. The resulting nuclei were manually counted using a haemocytometer. Nuclei were pelleted by centrifugation at 500g for 5 min in a swing-rotor centrifuge and the supernatant was removed, leaving approximately 10 µl of the buffer. Nuclei were counted again, and up to 16,000 nuclei were used for subsequent steps. However, in most samples, we did not load the maximum number of nuclei that the 10x Genomics kit accepts to avoid the risk of clogging the instrument, which can result in variable numbers of recovered nuclei among samples (Extended Data Fig. 1b). scRNA-seq and ATAC–seq libraries were constructed according to the manufacturer’s instructions (10x Genomics, CG000338). scRNA-seq libraries were sequenced using an Illumina NovaSeq 6000 in dual-index mode with ten cycles for i7 and i5 indices. snATAC–seq libraries were also sequenced using an Illumina NovaSeq 6000 in dual-index mode with 8 and 24 cycles for the i7 index and the i5 index, respectively.

Single-cell multiomic analysis

Raw data processing

Sequence data were processed to obtain single-cell feature counts by running cellranger (v.6.0.1) and cellranger-arc (v.2.0.0) for snRNA-seq data and snATAC–seq data, respectively. For snRNA-seq, the –include-introns option was used to align reads to the A.thaliana nuclear transcriptome built using the TAIR10 genome and the Araport 11 transcriptome. The chloroplast genome was removed from the reference genome for the analysis of both snRNA-seq and snATAC–seq data. The A.thaliana TAIR10 genome was downloaded from https://plants.ensembl.org/, and the chloroplast genome was manually removed. The A.thaliana TAIR10 gene annotation file was manually modified by removing chloroplast genes and replacing semicolons in the ‘gene_name’ column with hyphens to prevent errors during cellranger-arc processing. We note that a mean of 23.4% and 26.0% of reads were mapped on the chloroplast genome in snRNA-seq data and snATAC–seq data, respectively. Removing the chloroplast genome from the reference did not affect the overall RNA and ATAC count distribution. Count data were analysed using the R packages Seurat (v.5)49 and Signac50.

Quality control and cell filtering

Before integrating the datasets, doublets were predicted using DoubletFinder51 and filtered out. Quality control matrices for snATAC–seq were generated using a modified version of the loadBEDandGenomeData function in the R package Socrates52. ACRs were identified using MACS2 (ref. 53) with the following parameters: -g (genomesize)=0.8e8, shift=−50, extsize=100, and –qvalue=0.05, –nomodel, –keep-dup all. The fraction of reads mapping to within 2 kb upstream or 1 kb downstream of the transcription start site (TSS) was calculated. Nuclei were filtered using the following criteria: 200 < RNA UMI count < 7,000; RNA gene count > 180; 200 < ATAC UMI count < 20,000; and fraction of RNA reads mapped to mitochondrial genome < 10%. Seurat objects of individual samples were merged using the Merge_Seurat_List function of the scCustomize package.

snRNA-seq clustering

snRNA-seq clustering was performed using the R package Seurat. The cell-by-gene RNA count matrix was normalized using SCTransform. Dimension reduction was performed using principal component analysis with RunPCA. Technical variance among samples was reduced using Harmony54 using principal components (PCs) 1–20. Graph-based clustering was performed on the Harmony-corrected PCs 1–20 by first computing a shared nearest-neighbour graph using the PC low-dimensional space (with k = 20 neighbours). Louvain clustering (resolution = 1.0) was then applied, and the clusters were projected into an additionally reduced space using UMAP (n.neighbours=20 and min.dist=0.01).

snATAC–seq peak calling

Peaks were called independently on each cluster defined by snRNA-seq data and then combined using the CallPeaks function of Signac, which uses MACS2 with the following parameters: effective.genome.size=1.35e8, extsize=15, shift=−75. Peak counts were quantified using the FeatureMatrix function. Compared with the default peak calling pipeline of cellranger-arc (24,394 peaks), this cluster-specific peak calling approach was able to capture more peaks (35,560 peaks), which is consistent with a previous report50.

snATAC–seq clustering

Dimensionality reduction was performed using latent semantic indexing (LSI)55. First, the top 95% most common features were selected using the FindTopFeatures function. Then, the term-frequency inverse-document-frequency (TF-IDF) was computed using RunTFIDF with scale.factor=100,000. The resulting TF-IDF matrix was decomposed with singular value decomposition with RunSVD, which uses the irlba R package. Technical variance among samples was reduced using Harmony with LSI components 2–10. Graph-based clustering was performed on the Harmony-corrected LSI components 2–20 by first computing a shared nearest-neighbour graph using the LSI low-dimensional space (with k = 20 neighbours). Louvain clustering (resolution = 0.8) was then applied, and the clusters were projected into an additionally reduced space with UMAP (n.neighbours=30L and min.dist=0.01).

RNA–ATAC joint clustering

The two modalities were integrated by weighted nearest-neighbour analysis using FindMultiModalNeighbors of Seurat with Harmony-corrected PCs 1–20 for RNA and Harmony-corrected LSI components 2–20 for ATAC. Then, SLM (resolution = 0.5) was applied and projected with UMAP (n.neighbours=30L and min.dist=0.1).

ATAC gene activity score

ATAC gene activity score was calculated using the GeneActivity function of Signac with extend.upstream=400.

Peak-to-gene linkage analysis

LinkPeaks of Signac was used to call significant peak-to-gene linkage for each infection condition (mock, DC3000, AvrRpt2 and AvrRpm1). Background-corrected Pearson’s correlation coefficients between the gene expression of each gene and the accessibility of each peak within 500 kb of the gene TSS were calculated. A P value was calculated for each peak–gene pair using a one-sided z-test, and peak–gene pairs with P < 0.05 and a Pearson’s correlation coefficient of >0.05 were retained as significant links.

Motif enrichment analysis

Motifs present in the JASPAR2020 database56 for Arabidopsis (species code 3702) were used. Cluster-specific peaks were first identified using FindMarkers of Seurat with default parameters. A hypergeometric test was used to test for the over-representation of motifs in the set of differentially accessible peaks using FindMotifs of Signac. Motif plots were generated using MotifPlot. Motif enrichment scores (motif deviation scores) of individual TF motifs in individual cells were calculated using chromVAR17. For the integration of motif enrichment scores and mRNA expression, motif names in JASPAR2020 were matched with gene names. Motifs that could not be uniquely associated with genes were removed from analysis.

TF target prediction

To predict genes that are regulated by a TF, ACRs containing each TF motif were extracted. Then, genes for which expression correlated with these ACRs were identified. We considered genes within 5 kb from an ACR with a linkage score of >0.1 as significant candidates. GO enrichment analysis was performed for these candidates for each TF using the enrichGO function of clusterProfiler with org.At.tair.db annotation. For the analysis in Fig. 2h, a more stringent threshold (linkage score > 0.2) was applied. GO enrichment plots were created using ggplot2 (ref. 57).

Subcluster analysis

Nuclei with the same subcluster label were aggregated, and log2-transformed transcripts per million (TPM) values were calculated. Subclusters with more than 18,000 undetected genes were removed from the analysis. For the subclusters that passed the filtering step, genes were clustered using k-mean clustering with k = 12 (determined using the elbow method) (Extended Data Fig. 3b). Then, GO enrichment analysis was performed for the genes in each cluster (Extended Data Fig. 3b). Three clusters (1, 5 and 8) showed enrichment of an immunity-related function (responses to SA); genes in these clusters were defined as ‘putative immune genes’ and used for downstream analysis.

Pseudotime analysis

To calculate pseudotime, the cell-by-gene matrix for cells annotated as mesophyll was obtained from the snMultiome data. We used the function scanpy.tl.dpt with n_dcs=2. Pseudotime trajectories were constructed with each mesophyll cell in mock samples being used as a starting cell. Then, these individual trajectories were averaged in each cell to create a single, unified pseudotime trajectory. Heat map gene trends for a given gene were calculated by fitting a linear GAM using the pygam function LinearGAM withs (0, lam=400) to fit a GAM to RNA expression levels across sorted pseudotime values.

Comparisons between PRIMER and bystander cells

Among the subclusters of immune-active mesophyll cells in the snMultiome data, PRIMER and bystander cell clusters were defined on the basis of expression of the marker genes BON3 and ALD1, respectively. By comparing these cell states, DEGs were identified using the FindMarkers function of Seurat. Motif enrichment analysis was performed on ACRs within 2 kb of the cell-state-enriched DEGs.

To assess the contribution of CAMTA3 in different cell states, genes regulated by CAMTA3 were first identified using a published bulk RNA-seq dataset39, comparing wild-type and camta3-D (a dominant negative mutant of CAMTA3). Genes suppressed by CAMTA3 after ETI (triggered by AvrRpm1 or AvrRps4 infection) were overlapped with the cell-state DEGs defined above. Hypergeometric tests were performed to assess the significance of the overlaps.

snRNA-seq analysis of gt3a-KO plants

snRNA-seq was performed on gt3a-KO plants infected by AvrRpt2 or treated with water (mock) at 9 and 24 h.p.i. Two independent replicates were prepared for each infection condition. The data were filtered using the same criteria as for the snMultiome analysis without considering ATAC–seq information. The gt3a-KO snRNA-seq data were integrated with the Col-0 snMultiome data (mock and AvrRpt2 conditions), and de novo clustering was performed in the same way as described above. On the basis of immune-gene expression, immune-active mesophyll clusters were identified and further subclustered. From these subclusters, PRIMER and bystander cells were defined on the basis of expression of the marker genes BON3 and ALD1, respectively. For each cell state, DEG analysis was performed comparing Col-0 and gt3a-KO plants to assess the effect of the GT-3A mutation.

Comparisons between single-cell and bulk omics datasets

Our snATAC–seq data were compared with published bulk ATAC–seq data of mature A.thaliana leaves activating pattern-triggered immunity (PTI), ETI or both PTI and ETI, as well as non-immune-active leaves12. All ACRs identified in the bulk ATAC–seq datasets were combined and compared with the ACRs identified in our snATAC–seq datasets.

MERFISH

MERFISH panel design

We curated 500 target genes that included the following genes: (1) previously defined markers of A.thaliana leaf cell types58; (2) genes involved in various processes such as immunity, hormone pathways and epigenetic regulation; and (3) a variety of TFs previously analysed using DAP-seq (a TF–DNA interaction assay)59. Genes for which more than 25 specific probes could not be designed based on probe design software from Vizgen were excluded from the target gene panel. Highly expressed genes could cause the overcrowding of smFISH signals and hinder MERFISH quantification. To avoid including highly expressed genes in the panel, we assessed target gene expression by using a publicly available bulk RNA-seq dataset of A.thaliana infected by AvrRpt2 in the same setup as the current study at eight different time points (1, 3, 4, 6, 9, 12, 16 and 24 h)40. For each gene, the highest expression value among the eight time points was used. Genes that showed TPM values of >710 were not included in the panel. The total TPM of the 500 genes was approximately 22,000. ICS1 was targeted with a single round of smFISH as this gene is highly expressed. The smFISH result was provided as an image without quantitative information. Bacterial cells were visualized by targeting 19 highly expressed genes (based on previously published in planta bulk RNA-seq data60) as a single target. Supplementary Table 1 has a list of the genes targeted by MERFISH. All the probes were designed and constructed by Vizgen.

Tissue sectioning, fixation and mounting

Plants were grown according to the methods described above. Leaves matching the aforementioned treatments and time points were excised and immediately incubated and acclimated in OCT (Fisher) for 5 min. Following incubation, the leaves were immediately frozen as previously described61. Tissue blocks were acclimated to −18 °C in a pre-cooled cryostat chamber (Leica) for 1 h. Tissue blocks were trimmed until the tissue was reached, after which 10-µm sections were visually inspected until the region of interest was exposed. Sample mounting and preparation were performed according to the MERSCOPE user guide, but with slight modifications. In brief, a 10-µm section was melted and mounted onto a room-temperature MERSCOPE slide (Vizgen, 20400001), placed into a 60-mm Petri dish and re-frozen by incubation in the cryostat chamber for 5 min. Subsequent steps were performed with the mounted samples in the Petri dish. The samples were then baked at 37 °C for 5 min and were incubated in fixation buffer (1× PBS and 4% formaldehyde) for 15 min at room temperature. Samples were then washed with 1× PBS containing 1:500 RNase inhibitor (Protector RNase inhibitor, Millipore Sigma) for 5 min at room temperature in triplicate. Following the final PBS wash, samples were dehydrated by incubation in 70% ethanol at 4 °C overnight.

MERFISH experiment

Tissue sections were processed following Vizgen’s protocol. After removing 70% ethanol, the sample was incubated in the sample prep wash buffer (PN20300001) for 1 min and then incubated in the formamide wash buffer (PN20300002) at 37 °C for 30 min. After removing the formamide wash buffer, the sample was incubated in the MERSCOPE Gene Panel mix at 37 °C for 42 h. After probe hybridization, the sample was washed twice with the formamide wash buffer at 47 °C for 30 min and once with the sample prep wash buffer at room temperature for 2 min. After the washing step, the sample was embedded in hydrogel by incubation in the gel embedding solution (gel embedding premix (PN20300004), 10% (w/v) ammonium persulfate solution and N,N,N′,N′-tetramethylethylenediamine) at room temperature for 1.5 h. Then, the sample was cleared by first incubating it in digestion mix (digestion premix (PN 20300005) and 1:40 protector RNase inhibitor) at room temperature for 2 h, followed by incubation in the clearing solution (clearing premix (PN 20300003) and proteinase K) at 47 °C for 24 h and then at 37 °C for 24 h. The cleared sample was washed twice with the sample prep wash buffer and stained with DAPI and PolyT staining reagent at room temperature for 15 min. Samples were then washed with the formamide wash buffer at room temperature for 10 min and rinsed with the sample prep wash buffer. The sample was imaged using a MERSCOPE instrument, and detected transcripts were decoded on the MERSCOPE instrument using a codebook generated by Vizgen. Transcripts were visualized using Vizgen’s Visualizer.

MERFISH segmentation and processing

Cell-boundary segmentation was performed for each MERSCOPE data output. DAPI-targeting and poly(A)-targeting probes demonstrated variable success in staining nuclei and cytoplasm, respectively, depending on the samples and tissue regions examined (Fig. 3b and Extended Data Fig. 7e show failed and successful segmentation, respectively). By contrast, dense transcript areas marked nucleus locations more robustly (Extended Data Fig. 7e). Therefore, a transcript-based segmentation method was used. For each sample, a two-dimensional Numpy array of zeroes was generated, which modelled the total pixel area imaged. The coordinates of identified RNA transcripts were changed from 0 to 1 in this array. Next, the array was blurred using the cv2.GaussianBlur function in OpenCV with ksize=(5,5). The resulting array was chunked into 2,000 × 2,000 pixel regions. These regions were loaded into Cellpose62, a deep-learning-based segmentation tool, and a custom segmentation model was trained by manually segmenting nucleus objects across ten 2,000 × 2,000 pixel regions in the Cellpose GUI.

The custom model was then used to predict the segmentation boundaries in all the remaining regions, with the parameters diameter=22.92, flow_threshold=0.7 and cell probability threshold=−2. Next, the total number of cells in an experiment was calculated by summing the number of unique cells across all regions. This number was then used to initialize the –num-cells-init command in another segmentation tool, Baysor63, which considers the joint likelihood of transcriptional composition and cell morphology to predict cell boundaries. Baysor was run using a downloaded Docker image and parameters -s 250, –n clusters 1, -i 1, –force-2d, min-molecules-per-gene=1, min-molecules-per-cell=50, scale=250, scale-std=“25%”, estimate-scale-from-centers=true, min-molecules-per-segment=15, new-component-weight=0.2, new-component-fraction=0.3.

To test the quality of our transcript-based segmentation method, we used an FOV with successful DAPI staining (which was rare in our samples) and performed DAPI-based watershed segmentation and transcript-based segmentation (Extended Data Fig. 7d). Results from these two segmentation strategies agreed with each other in general, with the transcript-based approach capturing transcripts in the cytoplasm in addition to those in the nucleus (Extended Data Fig. 7d–f). This result indicated that our segmentation approach can reliably capture cells.

After Baysor segmentation, a cell-by-gene matrix was created from the transcript cell assignments. Cells with fewer than 50 assigned transcripts were removed. Scanpy was used for post-processing of our MERFISH experiments. After loading the respective cell-by-gene matrix into an Anndata object for each experiment, we stored the spatial coordinates of each cell obtained from Baysor. The individual transcript counts in each cell were normalized by the total number of transcript counts per cell. The Anndata cell-by-gene matrix was then log-scaled.

MERFISH–snMultiome data integration and label transfer

To integrate the MERFISH experiments with each other, we used FindIntegrationAnchors followed by the IntegrateData function in Seurat (v.5). To integrate the MERFISH experiment data with our snMultiome data, we integrated the corresponding time points in each modality separately. For each time point in the infection data (mock, 4, 6, 9 and 24 h), we used gimVI from scvi-tools to project both the snMultiome and MERFISH cells into the same latent space using their RNA expression counts. gimVI was trained over 200 epochs with a size 10 latent space for each time point. To transfer continuous observations (pseudotime, RNA and ATAC gene activity counts, and motif enrichment) from the snMultiome data to the MERFISH data, we assigned the numerical average of the nearest 30 snMultiome cells in the gimVI latent space for each MERFISH cell. Similarly, to transfer categorical observations, including cell types and cluster labels, we assigned each MERFISH cell the most common label in the set of its 30 nearest snMultiome cells. To plot the joint embedding of both modalities (MERFISH and snMultiome) for a single time point, we ran UMAP on the gimVI latent space with n_neighbors=30 and min_dist=0.1. Note that in the spatial mapping of cell types predicted in snMultiome (Fig. 4c), although the section was in the middle of the leaf, some epidermal cells were included because the section was not completely flat.

smFISH quantification and bacterial colony identification

Quantification of transcripts labelled by smFISH was performed using the Python package Big-FISH. Seven z planes of MERSCOPE smFISH images were projected into two dimensions by using numpy.max along the z axis and chunked into 2,000 × 2,000 regions. Spots were then called using the function bigfish.detection.detect_spots with threshold=50, spot_radius=(10, 10) and voxel_size=(3, 3). To identify bacterial colonies, bigfish.detection.detect_spots was called to label ‘bacterial meta gene’ locations with threshold=200, log_kernel_size=(1.456, 1.456) and minimum_distance=(1.456, 1.456). To account for autofluorescence from the plant tissue, the same spot-caller was used to call spots on DAPI and ICS1 channels. The spots from all three channels were aggregated, and DBSCAN from scikit-learn.cluster was used on all spots with eps=35 and min_samples=5. We kept all DBSCAN clusters for which the DAPI and ICS1 spots constituted less than 30% of the total cluster spots. These remaining clusters represented potential bacterial colonies. We manually evaluated each cluster to merge those marking the same colony and removed the clusters marking obvious autofluorescence, for example, signals from stomata. Windows of 300 × 300 pixels were generated around each final cluster, and detect_spots was used with threshold=95, spot_radius=(17, 17) and voxel_size=(3, 3) for accurate quantification of individual bacteria per cluster.

Bacterial neighbourhood analysis

To determine the level of immunity of the neighbourhood around each bacteria colony, we identified the 1–1,000 nearest cells in proximity to the colony. Then, the smoothed imputed pseudotime values of these neighbouring cells were averaged to obtain a single mean pseudotime value for each neighbour size. Error curves were calculated using the standard error divided by the mean at each neighbourhood size. These values serve as indicators of the overall immunity level of the area around each colony.

Spatial differential expression analysis

To identify genes that are spatially associated with immune-rich areas (related to Fig. 5a), we first smoothed our imputed spatial pseudotime over the spatially nearest 100 cells to find areas of immune-active hotspots. We then used RCTD64 to deconvolve doublets in our spatial data using the cell types in our snMultiome data as a reference. We created a reference object with min_umi = 15 and ran RCTD on our MERFISH data with the parameters gene_cutoff = 0.0001, gene_cutoff_reg = 0.0001, fc_cutoff = −3, fc_cutoff_reg = −3 and doublet_mode = ‘doublet’. We then ran C-SIDE20 of spacexr with run.CSIDE.single with cell_type_threshold = 0 to find the top spatially differentially expressed genes per cell type along the smoothed spatial pseudotime (explanatory variable) and plotted the negative log-transformed P value against the log-transformed fold change.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

RELATED ARTICLES

Most Popular

Recent Comments