Mice
The mice used in this study were housed in pathogen-free facilities at the University of Chicago and Stanford University. All mice were housed in positively pressurized, individually ventilated cage racks and changed in biological safety cabinets. Cage supplies were sanitized using hot water (82 °C). Bedding and shredded-paper enrichment were autoclaved and cages were provided with irradiated food. Reverse Osmosis water was provided by an automated watering system directly to each cage. Rodent housing rooms were maintained at a 12 h:12 h light:dark cycle. Temperature and humidity were within the Guide for the Care and Use of Laboratory Animals recommended ranges: 20–26 °C and 30–70% humidity. All experiments and animal-use procedures were conducted in compliance with the Guide for the Care and Use of Laboratory Animals and were approved by the Institutional Animal Care and Use Committee (IACUC) at the University of Chicago. B6.129-Trp53LSL-L25Q,W26S,F53Q,F54S heterozygous mice27,61 were provided by Laura Attardi (Stanford University) and were bred with B6-Foxn1cre homozygous mice62 purchased from Jackson Laboratories to generate Trp53LSL-L25Q,W26S,F53Q,F54S/wt;Foxn1cre/wt and Trp53wt/wt;Foxn1cre/wt littermates. Trp53fl/fl mice were purchased from Jackson Laboratories and bred with B6-Foxn1cre mice to generate Trp53fl/fl;Foxn1cre/wt mice. C57BL/6J mice were purchased from Jackson Laboratories. mTECs and thymocytes were collected from mice 4–5 weeks old. Sex-matched littermates were used for all comparisons of genetic perturbations.
Isolation, sorting and analysis of mouse mTECs
Thymic epithelial cells were isolated as previously described63 with minor modifications. In brief, thymi from 4–6-week-old mice were removed and connective tissue was removed. Stromal tissue was perforated using scissors and incubated with rotation in DMEM-F12 (Gibco) at room temperature for 10 min to liberate the thymocytes. The remaining stromal tissue was enzymatically digested (0.5 mg ml−1 Collagenase D (MilliporeSigma), 0.2 mg ml−1 DNaseI (MilliporeSigma), 0.5 mg ml−1 Papain (Worthington Biochemical)). Cells were stained with anti-EpCAM antibodies conjugated to APC-Cy7 (clone G8.8, BioLegend, 3 µl per 100 million cells) and EpCAM+ cells were enriched by positive selection using magnetic anti-Cy7 beads (Miltenyi, 10 µl per 100 million cells). The enriched fraction was stained with the appropriate panel of fluorochrome-conjugated antibodies to CD45 (clone 30-F11, Invitrogen, 1:100), Ly-51 (clone 6C3, BioLegend, 1:100), MHC-II I-A/I-E (clone M5/114.15.2, Invitrogen, 1:100), CD104 (clone 346-11A, BD Biosciences, 1:200), GP2 (clone 2F11-C3, MBL, 1:10), CD177 (clone 1171 A, R&D, 1:25), Ly-6D (clone 49-H4, Invitrogen, 1:200), Sca-1 (clone D7, BioLegend, 1:200), AIRE (clone 5H12, Invitrogen, 1:500), Ki-67 (clone SolA15, Invitrogen, 1:100), SynCAM (clone 3E1, MBL, 1:100), CD171/L1CAM (clone 555, Miltenyi, 1:25) along with fluorescein-labelled UEA-I (Vector Labs, 1:100), Zombie Aqua (BioLegend, 1:500) and DAPI (Invitrogen, 1:20). Intracellular staining for AIRE and Ki-67 was subsequently done using the eBioscience FoxP3 transcription factor staining kit (Invitrogen) according to the manufacturer’s instructions. Intracellular staining for MDM2 (clone EPR22256-98, Abcam, 1:25) was also done using the eBioscience FoxP3 transcription factor staining kit (Invitrogen) according to the manufacturer’s instructions with the addition of a 1-h incubation in blocking buffer (eBioscience permeabilization buffer with 5% normal donkey serum) before a secondary stain (BV412 donkey anti-rabbit, Jackson Immuno, 1:50). Cells were sorted using FACS Symphony S6, FACSAria Fusion or FACSAria II equipped with a 100-μm nozzle (BD Biosciences). Flow-cytometry data for thymic mimetic cells were acquired using a Cytek Aurora. All other flow-cytometry data were acquired using a BD LSRII or Fortessa. All flow-cytometry data were analysed using FlowJo (v.10).
Human thymic tissue acquisition and processing
Thymus fragments were obtained from a 12-week-old human patient with no known genetic abnormalities undergoing standard-of-care cardiac surgery. The patient was de-identified on receipt with written informed consent for the release of genomic sequence data in accordance with IRB protocol 20–1392 approved by the Biological Sciences Division and University of Chicago Medical Center Institutional Review Boards at the University of Chicago and protocol 2020-203 approved by the Advocate Aurora Health Research Subject Protection Program and Advocate Aurora Health Care Institutional Review Board. Connective tissue was removed and the remaining tissue was minced, then incubated with rotation in DMEM-F12 (Gibco) at 4 °C for 20 min to liberate the thymocytes. Stromal tissue was enzymatically digested using 0.5 mg ml−1 Collagenase D (MilliporeSigma) and 0.2 mg ml−1 DNase I (MilliporeSigma) at 37 °C for 20 min. The remaining fragments were incubated with rotation in 0.5 mg ml−1 Papain (Worthington), 0.25 mg ml−1 Collagenase D and 0.1 mg ml−1 DNase I at 37 °C for 20 min. Cells were stained with anti-EpCAM antibodies conjugated to APC-Cy7 (clone 9C4, BioLegend, 1:100) and EpCAM+ cells were enriched by positive selection with magnetic anti-Cy7 beads (Miltenyi). The enriched fraction was stained with DAPI (Invitrogen, 1:20), CD45 (clone 2D1, BioLegend, 1:100), LY51/CD249 (clone 2D3/APA, BD Biosciences, 1:00) and HLA-DRA (clone L243, BioLegend, 1:100) and sorted on a Symphony S6 (BD Biosciences).
Flow cytometry of thymocytes and splenocytes
Thymi from 4–6-week-old mice were removed and small cortical incisions were made before mechanical agitation with wide-bore glass pipettes in DMEM/F-12 (Gibco) to liberate the thymocytes. Spleens from mice aged 4 weeks to 12 months old were isolated in RPMI (Gibco) supplemented with 10% FCS. Cells were liberated by mincing with a syringe plunger and filtered through a 40-μm strainer. Following red blood cell lysis (BD PharmLyse), cells were stained with fluorochrome-conjugated antibodies specific for mouse CD4 (GK1.5, 1:100), CD8α (53-6.7, 1:100), CD25 (PC61, 1:100), CD44 (IM7, 1:100), CD69 (H1.2F3, 1:100), CD62L (MEL-14, 1:100), TCRβ (H57-597, 1:100) and DAPI (Invitrogen, 1:20). Intracellular staining for FoxP3 (clone FJK-16s, eBioscience, 1:100) was done using an eBioscience FoxP3 transcription factor staining kit (Invitrogen) according to the manufacturer’s instructions. Flow-cytometry data were acquired using a BD LSRII or Fortessa and analysed using FlowJo (v.10).
Bulk RNA-seq sample preparation
We FACS-sorted 75,000 primary mTECs directly into RULT lysis buffer (Qiagen RNEasy UCP Micro Kit) and total RNA was extracted following the manufacturer’s instructions. The mRNA was enriched and RNA-seq libraries were constructed using an Illumina TruSeq Stranded mRNA kit. Paired-end, dual-index sequencing was performed on an Illumina NovaSeq 6000 platform.
Bulk RNA-seq data processing
RNA-seq reads were mapped to the mm10 mouse genome assembly using TopHat (v.2.1.1) with the setting –microexon-search. Unmapped, unpaired and low-quality reads (MAPQ ≤ 5) were removed using samtools (v.1.9) view with settings -q 5 -f 2. Paired reads were counted for each gene using featureCounts from Subread (v.2.0.1). TPM values were calculated for each gene to quantify the relative abundance of transcripts for clustering analysis. The trimmed mean of M values was calculated for each gene for differential comparisons across samples using edgeR (v.4.0.2) (calcNormFactors()). Common dispersions were estimated using estimateCommonDisp() and Benjamini–Hochberg FDRs were calculated for pairwise comparisons using the exactTest(). Genes with FDR ≤ 0.05 were regarded as significant.
Definition of tissue-specific and AIRE-dependent genes
Previously published transcriptional data64 from Aire wild-type and Aire-knockout mTEChi were analysed according to the bulk RNA-seq pipeline outlined above. Genes that exhibited at least 1.5-fold induction in Aire wild type relative to Aire knockout and had Benjamini–Hochberg FDR ≤ 0.05 were regarded as Aire-induced. TSGs were classified as previously64, and αTSGs were taken to be the intersection of these two gene sets. For human TSGs, GTEx65 expression counts (median TPM), Shannon entropy \(\left(S=-\sum p{\log }_{2}p\right)\) across tissues was calculated for each gene. Genes with an entropy S ≤ 3 were included for downstream analyses.
Multiome sample preparation and sequencing
For all Multiome experiments, we used an ATAC + GEX single-cell kit and protocol (10X Genomics 1000236 with protocol CG000338 RevE) with minor modifications to sample preparation. In brief, 40,000 mTECs were FACS-sorted into 1× PBS supplemented with 2% BSA and centrifuged at 300g for 5 min. Cells were gently washed in 50 μl lysis buffer (10 mM Tris, 10 mM NaCl, 3 mM MgCl2 in nuclease-free water) and centrifuged at 300g for 5 min. Cells were resuspended in 50 μl permeabilization buffer (10 mM Tris, 10 mM NaCl, 3 mM MgCl2, 0.1% Tween20, 0.01% digitonin and RNase inhibitor (Invitrogen) in nuclease-free water) and incubated for 5 min on ice. Nuclei were gently washed with wash buffer (10 mM Tris, 10 mM NaCl, 3 mM MgCl2, 0.1% Tween20 and RNase inhibitor in nuclease-free water) and centrifuged at 500g for 5 min. Finally, nuclei were resuspended in 5 μl chilled diluted nuclei buffer (10X Genomics) and added to the transposition mix. Paired-end, dual-index sequencing was performed on an Illumina NovaSeq 6000 platform.
Multiome data quality control
After sequencing, bcl files were converted to fastq using cellranger-arc (v.2.0.2) mkfastq. FASTQ files were aligned to the mm10 or hg38 genome assembly using cellranger-arc count. ATAC-seq fragment files were used as inputs to the ArchR66 (v.1.0.2) analysis pipeline in R (v.4.3.2). Transcript count matrices were used as inputs to the Seurat (v.5.1.0) gene expression analysis pipeline. For gene expression quality control, cells with nFeature_RNA ≥ 250 and ≤ 6,000, nCount_RNA ≤ 25,000 and percent_mitochondrial ≤ 25 were included for downstream analyses. Transcript counts were log-normalized. For scATAC-seq quality control, cells with n_ATAC_Frags ≥ 3,000 and TSS_Score ≥ 10 were included for downstream analyses. Doublet inference was conducted using ArchR addDoubletScores(), and presumed doublets were excluded. Cells that passed each filter were admitted for downstream analyses. Finally, based on gene expression markers, contaminating cells (thymocytes) and putative mTEC mimetic cells were excluded from analysis (except for targeted analyses of mimetic compartments). In the wild-type multiome (Fig. 1), a further cluster of cells that exhibited uncharacteristically low TSS enrichment scores was excluded.
Multiome data processing
Dimensionality reduction, scATAC-seq clustering, projections, pseudotime, transcription factor motif enrichment (except for scATAC-seq fragments or genomic tiles, which was computed using HOMER2 (v.5.1) findMotifsGenome.pl with settings -size given), and transcription factor footprinting were performed using the ArchR pipeline with default parameters. For UMAP plots overlaid with continuous colour scales, MAGIC67 (v.2.0.3) imputation was used for data smoothing to facilitate better visualization. MAGIC-imputed values were used for UMAP display purposes only; imputed values were not used anywhere else in the analysis of scATAC-seq or scRNA-seq datasets (such as violin plots or heatmaps). For scATAC-seq peak calling, the standard ArchR workflow was used using MACS2 (v.2.2.9.1). To maximize the detection of open chromatin regions specific to each sample and stage in the mTEC developmental trajectory, fixed-width 501-bp scATAC-seq peaks were called (extendSummits = 250) on the Tn5-corrected single base insertions (shift = −75, extsize = 150, –nomodel) for each scATAC-seq cluster identified per sample (groupBy = Clusters, reproducibility = 1) using the ArchR wrapper function addReproduciblePeakSet(). The significance of each called peak was calculated as a false discovery rate (q-value) comparing the observed number of Tn5 insertions in the sliding window (300 bp) and the expected number of insertions (total number of insertions/genome size (–nolambda)). A q-value cutoff (cutOff = 0.1) and an upper limit for the number of peaks called per cell (peaksPerCell = 1,000, minCells = 100) were applied to prevent consideration of low-quality peaks. We also excluded peaks that mapped to the mitochondrial or Y chromosomes (excludeChr = c(chrM, chrY)). Peak sets called from each scATAC-seq cluster from respective samples were combined and trimmed for overlap using an iterative procedure that discarded any peak that directly overlapped with the most significant peak66. The resultant ‘union peak set’ was applied to all cells for WIP and OOP count-based and motif-based analyses. The fraction of fragments within peaks was computed automatically as a product of the addReproduciblePeakSet() function. Subnucleosomal and mononucleosomal fractions for each cell or sample were computed as the fraction of the cell’s scATAC-seq fragments whose length L ≤ 100 bp (subnucleosomal) or 100 < L ≤ 200 bp (mononucleosomal). To ensure reproducibility of bioinformatic analysis results, for each dataset, a single script was used for all the quality control and pre-processing, including purging of low-quality cells, doublet removal, peak calling, motif enrichment, dimensionality reduction and clustering. A file representing the full processed data was saved using saveArchRProject() and loaded for all subsequent analyses (this file was not edited after pre-processing). More individual scripts were used to load processed data and perform specific analyses or generate specific figures.
Peak-centric differential accessibility analysis
Differential chromatin accessibility analysis across peaks was done using ArchR getMarkerFeatures() with the following arguments: useMatrix = PeakMatrix, bias = c(TSSEnrichment, log10(number of scATAC-seq fragments)), testMethod = wilcoxon.
Processing of OOP scATAC-seq fragments
For each Multiome dataset, WIP and OOP fragments near genes of interest (such as αTSGs, housekeeping genes and maturation-induced genes) were retrieved using the ArchR and GenomicRanges R packages. For each gene: first, a search window, search_window, was established around the \({\rm{TSS}}({\rm{search}}\_{\rm{window}}={\rm{TSS}}\pm {\ell })\); and second, scATAC-seq fragments intersecting the search_window were retrieved from cells of interest, cell_subset, using the ArchR getFragmentsFromProject() function with arguments subsetBy = search_window and cellNames = cell_subset. Fragments were then partitioned based on whether they overlapped the data’s union peak set using subsetByOverlaps() with arguments invert = FALSE to retrieve WIP fragments, or invert = TRUE to retrieve OOP fragments. Finally, fragments were binned and/or tallied for the specific application (see below).
Analyses comparing αTSGpos and αTSGneg mTECs
Cells from early mature, mid mature and late mature clusters expressing any αTSGi > 0 were selected as the αTSGpos cohort and a size-matched cohort of αTSGneg cells was sampled randomly from the remaining cells from the same three clusters. These cohorts were then used as inputs to getMarkerFeatures()in ArchR for differential accessibility of peaks between αTSGpos and αTSGneg mTECs. For local OOP and WIP analysis, ATAC-seq fragments within peaks and outside of peaks from αTSGpos and αTSGneg cohorts were intersected with a ±5 kb sliding window with 1 kb increments, normalized to the total number of ATAC-seq fragments per cell, and tallied in each window within a region flanking αTSGi . For αTSG coexpression analysis, the probability of detecting each αTSGi neighbouring αTSG0 within the specified length scale (or a randomly selected alternative αTSG as a control) was computed for each of the αTSGpos and αTSGneg cohorts.
Regression analysis
For each αTSGi, the total number of OOP and WIP scATAC-seq fragments within the characteristic window of instability \(({\ell }=\pm 50\,{\rm{kb}})\) was computed for each mTEC in the early mature, mid mature and late mature clusters. A logistic regression framework was used (glm() with family = binomial) to estimate the probability of expressing a given αTSG based on the number of log10(OOP + 1) or log10(WIP + 1) fragments using log10(n_ATAC_Frags) per cell as a covariate. P-values for regression coefficients were generated using the Wald-χ2 test (anova(test = ‘LR’)).
CUT&RUN sample preparation
CUT&RUN was performed as previously described28 with minor modifications. In brief, 350,000–500,000 cells were washed 3 times in wash buffer (20 mM HEPES pH 7.5, 150 mM NaCl, 0.5 mM spermidine, 1× EDTA-free protease inhibitor cocktail (Roche)) then bound to Concanavalin-A beads (Bangs Laboratories) according to the manufacturer’s instructions. Cells were incubated with 1:100 dilution of anti-p53 antibody (Leica NCL-L-p53-CM5p) for 2 h or overnight at 4 °C in permeabilization buffer (1× permeabilization buffer (eBioscience), 0.5 mM spermidine, 1× EDTA-free protease inhibitor cocktail, 2 mM EDTA). The sample was then incubated with 700 ng ml−1 pA-MNase (S. Henikoff) in permeabilization buffer at 4 °C for 1 h. Digestion was done in 0.5× permeabilization buffer supplemented with 2 mM CaCl2 at 4 °C for 1 h. The reaction was stopped by the addition of 2× stop buffer (final concentration 100 mM NaCl, 10 mM EDTA, 2 mM EGTA, 20 μg ml−1 glycogen, 25 μg ml−1 RNase A (Thermo Fisher)) and the sample was incubated at 37 °C for 20 min. Protein in the sample was then digested in 0.1% SDS and 250 μg ml−1 Proteinase K (New England Biolabs) for 2 h at 56 °C, shaking gently. CUT&RUN fragments were purified by phenol chloroform extraction. CUT&RUN libraries were generated using NEBNext UltraII DNA Library Prep Kit for Illumina coupled with NEBNext Multiplex Oligos for Illumina (New England Biolabs) with modifications optimized for small fragments, as detailed in https://doi.org/10.17504/protocols.io.wvgfe3w. Paired-end, dual-index sequencing was performed on the Illumina NextSeq500 platform.
CUT&RUN data processing
CUT&RUN reads were mapped to mm10 mouse genome assembly using Bowtie2 (v.2.2.9) with settings –local –very-sensitive-local –no-unal –no-mixed –no-discordant –phred33 -I 10 -X 700. PCR duplicates were removed using Picard (v.2.21.8) MarkDuplicates REMOVE_DUPLICATES=true VALIDATION_STRINGENCY = LENIENT. Reads with MAPQ scores below 30 were purged and excluded from downstream analysis using samtools (v.1.9) view -b -q 30 -f 2 -F 1804. Peaks were called for each sample using MACS2 (v.2.2.7.1) with settings –shift 0 –extsize 200 –nomodel –call-summits –keep-dup all -p 0.01. For each sample, a 301-bp fixed-width peak set was generated by extending the MACS2 summits by 150 bp in both directions. Peaks were ranked by significance (MACS2 peak score) and overlapping peaks with lower peak scores were removed iteratively to create non-overlapping sample peak sets. Peaks mapping to chrY, as well as any that spanned genomic regions containing “N” nucleotides, were removed. Robust peaks were defined by a score per million (SPM) (each peak score divided by the sum of all peak scores in the sample, divided by 1 million), and we retained only those peaks with SPM ≥ 5. We defined p53 CUT&RUN peaks by further filtering for peaks that overlapped with known p53-binding motifs (HOMER2, v5.1) from samples with characterized p53 activity (mTEClo samples). CUT&RUN fragment counts across regions of interest were normalized by the number of unique fragments in the sample library.
ChIP–seq data processing
ChIP–seq reads were mapped to mm10 mouse genome assembly using Bowtie2 (v.2.2.9) with settings –very-sensitive -X 2000. PCR duplicates were removed using Picard (v.2.21.8) MarkDuplicates REMOVE_DUPLICATES=true VALIDATION_STRINGENCY = LENIENT. Reads with MAPQ scores below 30 were purged and excluded from downstream analysis using samtools (v.1.9) view -b -q 30 -F 1796. ChIP–seq read counts were normalized by the number of unique reads in the sample library.
Histopathology
Histopathology experiments were carried out as previously described9. In brief, tissues were fixed in buffered 10% formalin and paraffin-embedded. H&E staining was done by the standard methods. Histopathology scores were assigned using a four-tier system based on the degree and distribution of lymphocytic infiltration observed in the tissue sections. A score of 0 was assigned when no lymphocyte infiltration was detected; a score of 1 corresponded to minimal infiltration, characterized by very few small, isolated clusters; a score of 2 corresponded to moderate infiltration, in which several small to moderately sized clusters of lymphocytes were observed; a score of 3 corresponded to severe, diffuse infiltration, indicated by the presence of numerous large clusters distributed throughout the tissue.
Statistical analysis
De novo and known transcription factor motif P-values were determined using HOMER2 (v.5.1). For bulk RNA-seq, P-values for differentially expressed genes were computed using edgeR (v.4.0.2) (estimateCommonDisp()) and corrected for multiple testing using the Benjamini–Hochberg FDR method. For scATAC-seq and scRNA-seq, FDR-corrected Wilcoxon test P-values for differentially accessible ATAC peaks and differentially expressed genes were computed using ArchR (v.1.0.2) (getMarkerFeatures(testMethod = “wilcoxon”)). Logistic regression coefficient estimate P-values were computed using analysis of variance (ANOVA; anova(test = “Chisq”)) to compare the regression results from glm(). Box plots show the median (centre line), 25th and 75th percentiles (edges), and whiskers show ±1.5 times the interquartile range. Outliers beyond the interquartile range are represented as individual dots. All other P-values and statistical tests were computed in R or Prism and are specified in the figure legends.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.