Mouse models
The intestinal epithelium-specific Villin-creERT2 (ref. 61) mouse line was intercrossed with various conditional tumour driver models: Kras(lsl)G12D/WT (KrasG12D)62, Pik3ca(lsl)H1047R/WT (Pik3caH1047R)63, ApcΔ14/WT (Apchet)64, Trp53Δ2-10/Δ2-10 (Trp53null)65, Trp53Δ2-10/WT (Trp53het), PtenΔ5/Δ5 (Ptennull)66, Fbxw7Δ5/Δ5 (Fbxw7null)67, R26-Notch1ic/WT (N1-ICDhet)68, Arid1aΔ8/Δ8 (Arid1anull)69, and R26-CAG-Brainbow2.1/Confetti (R26(lsl)Confetti/WT)70,71. Lgr5-DTR-eGFP72 mice were used to study the effects of ENU mutagenesis on Lgr5-positive cells. Mice were from mixed backgrounds with high C57Bl/6J inheritance. Genotyping was done by Transnetyx.
Animal husbandry
All experiments used male and female mice of at least 6 and 8 weeks of age, respectively. Mice were housed under controlled conditions (temperature 21 ± 2 °C, humidity 55 ± 10%, under a 12 h–12 h light–dark cycle) in individually ventilated cages in a specific-pathogen-free facility (tested according to the recommendations for health monitoring by the Federation of European Laboratory Animal Science Associations). Food and water were provided ad libitum. None of the mice were involved in any procedure before the study. For survival curve generation, mice were aged until they showed clinical signs of tumour burden (anaemia, hunching and loss of body condition). All of the animal experiments were performed in accordance with the guidelines of the UK Home Office under the authority of project licence PD5F099BE, approved by the Animal Welfare and Ethical Review Body at the CRUK Cambridge Institute, University of Cambridge.
Field induction and ENU mutagenesis
Complete intestinal field induction was achieved by a single intraperitoneal injection of 4 mg (3 mg for mice <20 g in weight) of tamoxifen (Merck T5648) dissolved in ethanol and sunflower oil (Merck S5007) at a 1:9 ratio (20 mg ml−1 stock solution). A lower tamoxifen dose (0.15 mg) was used for KrasG12D mice in the sequencing cohort to reduce tumour multiplicity and therefore random tumour collisions. We resuspended ENU (Merck N3385) in 5 ml 95% ethanol/45 ml phosphate/citrate buffer at a final concentration of 20 mg ml−1 (ref. 73) and injected it intraperitoneally at 200 mg per kg.
Mouse dissection
Survival data and most tumour counts were generated from mice culled at the humane end point. Timed culls before this end point were used to assess tumour initiation kinetics in the control cohort and to allow tumour counts in high tumour density models (KrasG12D cohort). The pancreas, mesenteric lymph nodes, liver and lung were dissected and placed in formalin solution overnight at room temperature then exchanged to 70% ethanol for histological assessment of metastasis. Other organs were also collected and processed if relevant pathology was observed during dissection. Intestines were dissected, longitudinally opened, whole-mounted and fixed in 4% PFA overnight at 4 °C in an orbital shaker. Tumours were counted and either excised and processed for genomic DNA (gDNA) extraction or Swiss-rolled for histopathological assessment.
Intestinal tumour count and microscopy
Tumours in whole-mounted intestines were counted under a dissection microscope. Proximal SI was defined as the first 16–20 cm, comprising SI10 and SI20. Distal SI was defined by the remaining length, comprising SI30, SI40 and, sometimes, SI50. The colon location comprised the caecum and proximal and distal colon. Apchet (SI + colon) and KrasG12D (colon) whole-mounts were sliced into 2–3 cm pieces, optically cleared using the CUBIC1a protocol74 for 7 days at 37 °C with solution change every other day, stained with DAPI (1:1,000; Thermo Fisher Scientific, D1306), and refractive-index-matched in Rapiclear 1.52 (Sunjin Lab, 152002) for 2 days before mounting in 1 mm iSpacers (Sunjin Lab). Representative tissue pieces for each region were imaged throughout the whole thickness in a Leica SP5 TCS confocal microscope (×10/0.4 NA objective, 8–11 µm z-step, ×1.5 optical zoom). Tumours were manually counted using Fiji image analysis software75 and the total tumour count was extrapolated.
Lineage tracing of ENU-treated mice
Villin-creERT2R26(lsl)Confetti/WT mice were treated with ENU (200 mg per kg) or vehicle on day 0 then induced with 1 mg tamoxifen on day 18. Mice were dissected 21 days after tamoxifen and representative 2 cm segments of the proximal SI were collected. Tissue samples were fixed overnight in 4% PFA, washed in PBS, optically cleared, mounted and imaged using a confocal microscope. The width of YFP-positive clones was quantified by calculating the fraction of the crypt circumference (1/8 to 8/8) occupied by YFP. The average clone width per treatment was calculated by multiplying the count of each clone size by the number of crypt fractions occupied (1–8), then dividing by the total number of clones counted.
Tumour gDNA extraction
Whole PFA-fixed tumours were washed in PBS, excised, cut into smaller pieces and gDNA extracted using the QIAamp DNA FFPE Tissue Kit (Qiagen 56404). In brief, tissue was digested in lysis buffer + proteinase K solution overnight at 56 °C with shaking at 1,200 rpm in a thermocycler (Eppendorf) and then homogenized in QIAshredder columns (Qiagen 79656) before in-column washing and elution in ATE buffer according to the manufacturer’s protocol (although the 90 °C incubation step after tissue digestion was omitted to avoid excessive DNA fragmentation). The concentration of the extracted DNA was assessed using a Qubit dsDNA HS assay kit (Thermo Fisher Scientific, Q32854).
Antibodies
Primary antibodies
CTNB1 (0.25 µg ml−1, mouse, 610154, BD Biosciences), O6-ethyl-2′deoxyguanosine (0.5 µg ml−1, rat, SQX-SQM001, Squarix Biotechnology), GFP (10 µg ml−1, chicken, ab13970, Abcam), PTEN (1:300, rabbit, 9559, Cell Signalling), BrdU (2.6 µg ml−1, sheep, ab1893, Abcam), CD326 (EPCAM) AlexaFluor 647 antibody (1:2,000, 118210, BioLegend). Secondary antibodies were as follows: rabbit anti-rat (1:250, A110-322A, Bethyl Laboratories), rabbit anti-mouse (1:1,500, ab125913, Abcam), anti-chicken (1:500, 303-005-003, Jackson Labs) and anti-sheep (1:500, 313-005-003, Jackson Labs).
Immunohistochemistry
Fixed tissue was paraffin embedded and cut into 3–4 µm sections by the CRUK CI Histopathology Core. Haematoxylin and eosin (H&E) staining was performed using an automated ST5020 Multi-stainer (Leica Biosystems). Staining was performed on Leica’s automated Bond-III platform in conjunction with the Polymer Refine Detection System (Leica, DS9800). For CTNB1 and O6-ethyl-2′deoxyguanosine, antigen retrieval was performed in sodium citrate (Leica’s Epitope Retrieval Solution 1, AR9961) for 20 min at 100 °C. For GFP and BrdU, antigen retrieval was performed using enzymatic digestion. In brief, enzyme digestion (proteinase K) was run at 37 °C using Leica’s Bond enzyme concentrate (AR9551), which contains proteolytic enzyme (17 mg ml−1) and stabilizer. For BrdU, slides were submerged in 2 M HCl for 15 min, following the dewax and rehydration step. Blocking was performed with Protein Block Buffer (X090930-2, Dako). After incubation with the primary antibody, the sections were incubated with the secondary antibody before development and mounting. For CTNB1 staining, an additional mouse-on-mouse blocking step was performed.
RNA in situ hybridization
Detection of mouse Trp53 was performed on FFPE sections using RNAscope 2.5 LS Reagent Kit-RED (322150, ACD) and RNAscope LS 2.5 Probe-Mm-Trp53-O1 (513008, ACD). In brief, 3-µm-thick sections were baked for 1 h at 60 °C before loading onto a Bond Rx instrument (Leica Biosystems). Slides were deparaffinized and rehydrated on-board before pretreatment using Epitope Retrieval Solution 2 (AR9640, Leica Biosystems) at 95 °C for 15 min and ACD enzyme from the LS Reagent kit at 40 °C for 15 min. Probe hybridization and signal amplification were performed according to the manufacturer’s instructions, with the exception of Amp5 (incubation for 30 min). Fast red detection was performed on the Bond Rx using the Bond Polymer Refine Red Detection Kit (DS9390, Leica Biosystems) with an incubation time of 20 min. The slides were then removed from the Bond Rx, heated at 60 °C for 1 h, dipped in Xylene and mounted using EcoMount Mounting Medium (EM897L, Biocare Medical). The slides were imaged on the Aperio AT2 (Leica Biosystems) system to create whole-slide images. Images were captured at ×40 magnification with a resolution of 0.25 μm per pixel.
Single-cell preparation for flow cytometry
The first 10 cm of proximal SI from tamoxifen-induced mice was flushed in cold PBS, longitudinally opened and cut into 0.5 cm segments. The segments were repeatedly passed through a 10 ml pipette using cold PBS until the solution ran clear, incubated in cold 5 mM EDTA/PBS for 30 min, washed and resuspended in cold PBS. Crypt-enriched fractions were released by five 5 s manual shaking cycles then pelleted and dissociated in 20 ml trypsin (1× 0.05% EDTA, 25300054, Thermo Fisher Scientific) at 37 °C for 7 min with vigorous shaking every minute. After multiple washes in PBS/2% FBS, dissociated cells were resuspended in PBS/2% FBS before incubation with anti-mouse CD326 (EPCAM) AlexaFluor 647 antibody and addition of DAPI (10 μg ml−1) to distinguish between live and dead cells. Flow sorting was carried out on the BD FACS Aria SORP (BD Biosciences) system using appropriate single-stained and unstained controls (Supplementary Fig. 6).
Genotyping PCR
Sorted EPCAM-positive and EPCAM-negative cells were pelleted and washed once in cold PBS. Genomic DNA extraction was performed using the ReliaPrep gDNA tissue miniprep system (A2051, Promega) and PCR was performed according to the Q5 Hot Start High-Fidelity DNA Polymerase protocol (M0493L, New England BioLabs) with enhancer, using the following primers (10 µM, 1:1:1 stoichiometry): KrasG12D, 5′-GTCTTTCCCCAGCACAGTGC, 5′-CTCTTGCCTACGCCACCAGCTC, 5′-AGCTAGCCACCATGGCTTGAGTAAGTCTGCA; Fbxw7null, 5′-CAGTGGAG TGAAGTACAACTC, 5′-GCATATTCTAGAGGAGGGTAT, 5′-GGCCAGCC TGGTCTGTATA. The resulting PCR products were analysed using the TapeStation 4200 System (Agilent).
Hybridization capture bait design
A custom mouse cancer gene list was assembled, with a total of 589 genes. Capture baits were designed using the SureDesign application (Agilent). Species: Mus musculus (UCSC mm9, NCBI build 37, July 2007). Target parameters: coding exons + 10 bases each from the 5′ and 3′ ends. Target summary: 9,936 target IDs resolved to 9,936 targets comprising 9,936 regions for a total of 588 genes. Region size: 6.778 Mb. Probe summary: total probes, 155,861; total probe size, 6.757 Mb; coverage, 98.7%.
Targeted amplicon panel design
Standard BioTools D3 Assay Design software was used to assemble a targeted panel of primers covering ten genes (Apc, Ctnnb1, Kras, Nras, Hras, Braf, Pten, Fbxw7, Smad4 and Trp53). Apc, Ctnnb1, Kras and Trp53 had 100% of their exomes covered by the panel, whereas coverage for the other genes was limited to previously identified hotspots sequenced using the hybridization capture panel (Supplementary Fig. 5a).
Hybridization capture panel sequencing
Captured DNA was sequenced on the Illumina HiSeq 4000 platform, generating paired-end sequence reads from multiplexed libraries. Bait coordinates were mapped relative to the GRCm38 reference using the liftOver tool76. Adaptor clipping and PCR duplicate marking were performed using biobambam2 (v.2.0.79)77 and sequence reads were aligned to GRCm38 using BWA-MEM (v.0.7.17)78. Of the 588 genes originally targeted, 551 (94%) had at least 90% of their coding regions (Ensembl v96 gene build) within 100 bp of a bait mapped to GRCm38. Two genes (Rit1 and Dlg2) had minimal or no coverage, and Trerf1 was fully covered but not annotated as a gene in the Ensembl v96 gene build. A total of 585 genes (Supplementary Table 3) were analysed for variant calling. The median depth of sequencing coverage across all tumour and normal samples (excluding PCR duplicates) in the target regions was 308×.
Hybridization capture sequencing variant calling and filtering
The Pisces (v.5.2.10.49) single-sample variant caller79 was used to call SNVs in each tumour and normal adjacent sample. The stitcher tool within Pisces was used to preprocess the alignment files, followed by variant calling and variant quality recalibration. Normal samples were processed in both the somatic and germline modes within Pisces to call variants with high and low VAFs. The consequences of variants were annotated with the Ensembl Variant Effect Predictor (VEP)80 using gene builds from Ensembl release 96. In genes with multiple transcripts, only the consequences associated with the canonical transcript, as defined by Ensembl, were considered. In tumour samples, variants that failed Pisces internal filters, variants annotated as single-nucleotide polymorphisms (SNPs) or indels in the Mouse Genomes Project (MGP release v6)81, and variants overlapping at least one normal sample from germline calling mode were removed. Variants were not removed if they were in the same genomic position as a MGP variant or germline call but with a different variant allele. To remove artefacts from PCR errors, sequencing errors and read misalignments, several additional filtering steps were implemented. To select SNVs with high-quality mismatched bases, SNVs were required to have at least two reads with the alternative allele, where at least half of the alternative alleles had a minimum base alignment quality score of 30 (ref. 82). To remove artefactual variants found in regions where reads are frequently misaligned, SNVs were removed if at least four normal samples had at least four gapped reads within 10 bp of the variant position. To further remove likely artefacts with low VAFs, SNVs were removed if they were called in somatic mode in at least two normal samples. Moreover, a variant was removed if it fell inside a genomic region with a mappability score <1, where mappability was calculated from 75-mers across the reference genome using gem-mappability (build 1.315)83. After removing SNPs and artefacts as above, we identified SNVs with low VAFs that were recurrent in the tumours of one or more mice. We suspect that these SNVs were due to coincident lymphoid expansions (leukaemia/lymphoma); recurrent variants were therefore excluded from the tumour call set if they had a maximum VAF ≤ 0.05, or if the variant in a matched normal sample(s) had a VAF ≥ 0.01.
Identification of driver genes
The dNdScv package (v.0.1.0)16 was used to identify genes under positive or negative selection. Genes that were either absent from the RefCDS used (Trerf1) or not covered by pull-down baits (Dlg2, Rit1) were removed from the input gene list. The following dNdScv parameters were used: (refdb = “mm10_Ens96_dNdScv.rda”, max_muts_per_gene_per_sample = Inf, max_coding_muts_per_sample = Inf).
Targeted amplicon sequencing
The targeted amplicon library was prepared according to the Standard BioTools protocol using the 8.8.6 integrated fluidic chip and the Juno system. In brief, each integrated fluidic chip allowed the interrogation of 48 samples against eight primer pools, leading to the generation of 286 amplicons for each sample. The collected amplicons for each chip run were quantified using a Bioanalyzer 2100 (Agilent) and pooled equimolarly before sequencing as paired-end 150 bp reads on the Illumina NovaSeq platform.
Targeted amplicon sequencing variant calling and filtering
FASTQ files were aligned against the Genome Reference Consortium mouse genome 39 (GRCm39)84 using bwamem (https://github.com/lh3/bwa). Samples with a median number of reads per amplicon of <10 were excluded from further analysis. Mutation calling was performed using the ampliconseq pipeline (https://github.com/crukci-bioinformatics/ampliconseq) with VarDict as variant caller and a minimum allele fraction threshold of 0.01. Variant annotation was performed using Ensembl VEP80. The list of called mutations was filtered to remove variants that did not pass internal noise filters. Indels were removed because of the tendency for ENU to cause SNVs85. Finally, variants were retained only if they were called in at least two overlapping amplicons per sample and supported by at least five mutant reads.
ENU mutation signature inference
We inferred ENU signatures from SNVs called in hybridization capture-sequenced tumour samples from all cohorts and protocols (n = 15,979). A subset of these SNVs (n = 1,573) are from ENU-treated cohorts not analysed in the manuscript but were included to increase the total sample size for this analysis. Apc high-impact SNVs and all SNVs in Ctnnb1, Ros1 and Ntrk3 were excluded. A total of 15,362 SNV trinucleotide contexts were converted to the COSMIC convention and normalized to SNV abundance within the hybridization panel target regions to calculate frequency. In O/E ratio plots for Ctnnb1 mutations, the expected frequencies for each mutation trinucleotide context (Fig. 2c) were rescaled by dividing the frequency by the sum of frequencies present in each protocol or cohort analysed (Extended Data Fig. 5a,d). For Apc, expected frequencies were calculated per domain bin (A, B, C, D, E) where all frequencies from trinucleotide contexts resulting in a stop codon were summed. Expected frequencies for each of the 15 Apc bin combinations were calculated by multiplying the probabilities of the two component bins.
Decay curves
To generate decay curves illustrating the proportion of tumours remaining in the rescue protocol, we used Apc and Ctnnb1 mutation ratios obtained from sequencing experiments (Figs. 1h and 4e and Extended Data Fig. 8e). These ratios were then multiplied by the average tumour count in the SI at each timepoint, considering both priming TE and the rescue ET10 and ET30 protocols (Fig. 4b). Subsequently, these values were divided by the TE protocol (X = 0) to generate the plots. The same logic was applied for individual Apc bins and Ctnnb1 exon-3 mutation decay curves where the ratios obtained from sequencing experiments were multiplied by the average tumour count for Apc and Ctnnb1 at each timepoint. χ2 statistical tests were used to compare observed and expected tumour counts at each rescue timepoint, with values of <0.05 considered to be statistically significant. Expected tumour counts were generated from neutral decay curve proportions at day 10 and day 30, multiplied by the observed tumour count in the TE priming protocol (X = 0). The neutral decay curve was inferred using CryptDriftR (https://github.com/MorrisseyLab/CryptDriftR) with the following parameters: Ns (number of stem cells) = 5, λ (replacement rate) = 0.1, Pr (probability of replacement) = 0.5, tau = 1 in a time sequence (1,30, by = 0.5).
Copy-number analysis
The tumour gDNA samples used for copy-number analysis comprised a subset of the samples used for hybridization capture sequencing (Extended Data Fig. 1). Libraries were prepared from 400 ng gDNA according to the manufacturer’s guidelines (Illumina DNA PCR-Free Prep, Tagmentation, 20041795). Libraries were sequenced on the NovaSeq 6000 (Illumina) system. Shallow whole-genome sequence data were aligned to the mm10 reference genome using BWA (v.0.7.17)86 and processed with QDNAseq (v.1.30.0)87 to generate copy-number data. Read counts within 100 kb bins, corrected for sequence mappability and GC content, were normalized relative to a matched normal adjacent sample, if available, before segmentation. For tumour samples for which no matched normal was used, 250,000 reads were sampled from each sample and pooled to create an aligned BAM file that was used as the control. Ploidy (defined as 2), cellularity estimation (2 × average VAF) and absolute copy-number fitting were carried out using Rascal (v.0.7.0)88.
GENIE CRC cohort analysis
This study used data from the AACR Project GENIE registry, an international public cancer registry that aggregates clinical-grade cancer genomic data with clinical outcome data89. The GENIE colorectal cohort data were accessed through the registry’s public data releases (version 15), which are available to the research community for use in genomics projects. We analysed tumour samples (one per donor) containing APC-truncating mutations (n = 10,329). Samples were excluded if they had more than two driver mutations (excluding APC and KRAS) from the following genes: ATM, ARID1A, AMER1, BRAF, FBXW7, PTEN, PIK3CA, SMAD4, SOX9 and TCF7L2 (n = 9,386). Moreover, samples with more than two APC mutations were removed. The final dataset (n = 9,278 samples) was analysed on the basis of the presence of one or two APC mutations and KRAS driver mutation status. Each APC mutation (for single-hit cases) or the most C-terminal mutation (for double-hit cases) was assigned a 20-amino-acid repeat retention score of 0–7, representing the number of full-length 20-amino-acid repeats preserved (for example, a mutation retaining three repeats received a score of 3). The expected relative frequencies for each potential APC stop codon, assuming a SNV (Fig. 5b), were calculated on the basis of the 96 COSMIC trinucleotide context frequencies from normal human colon33, recalculated for protein-coding regions.
INCISE cohort
Ethical approval was obtained for the retrospective use of data (GSH/20/CO/002) and analysis of surplus diagnostic tissue without individual informed consent (22/WS/0020) following applications to Glasgow SafeHaven, NHS Greater Glasgow and Clyde and the West of Scotland Research Ethics Committee.
The cohort comprises individuals 50–74 years of age who underwent polypectomy at bowel screening colonoscopy in NHS Greater Glasgow and Clyde between 2009 and 2016 and had a surveillance colonoscopy between 6 months and 6 years after the index procedure. Data, including demography, clinical characteristics, endoscopy report results and histopathological data, were collected retrospectively. These variables included, but were not limited to, the number of polyps present at index and follow-up colonoscopies, polyp histology, morphology, location, size and grade of dysplasia. Data were collected from electronic health records (Clinical Portal, Orion Health and Trakcare, Intersystem), electronic endoscopy reporting software (Unisoft GI Reporting Software, v.2.5, Unisoft Medical Systems) and Telepath Laboratory Information Management System electronic pathology database and linked using the Community Health Index unique identifier. Individuals with CRC, history of CRC, diagnosed polyposis or CRC predisposition syndrome, and inflammatory bowel disease were not included.
INCISE adenoma tissue processing and DNA sequencing
The most advanced adenoma removed at the index bowel screening colonoscopy was identified on the basis of lesion size and highest grade of dysplasia, and the corresponding FFPE samples retrieved from the clinical pathology archive. The identified synchronous and metachronous polyps were similarly retrieved. After DNA extraction (Maxwell CSC FFPE, Promega) and quality control (Qubit 4 Fluorometer, Thermo Fisher Scientific, minimum acceptable DNA concentration 1.0 ng µl−1), sample sequencing and variant calling were performed by the Genomic Innovation Alliance (GIA) using the Agilent SureSelect XT2 HS2 method (Agilent). Regions of interest were enriched with the Agilent SureSelect CancerPlus panel (design ID: S3225252) and the quality and quantity of libraries was determined using the TapeStation using a D1000 ScreenTape (5067–5582, Agilent).
Sequencing data were processed and SNV files were generated using the GIA HOLMES pipeline (v.1.3.1). Bcl2fastq conversion software (Illumina, v.2.19.1.403, v.2.20.0.422 on C++) was used to convert NovaSeq-6000-generated .cblc raw data files to FASTQ files, which were then aligned using Burrows–Wheeler Alignment90 (v.0.7.15 as a C programme). deepSNV/Shearwater91,92 (v.1.22/v.1.1.0, v.1.22.0.5/v.1.1.0 as an R package/R wrapper script) was used for calling SNVs, while Pindel93 (v.0.2.5b8-ww1 as a standalone application) was used for large insertions/deletions. Once called, variants were annotated using CAVA94(v.1.2.2.ww1, v.1.2.2.ww5 on Python). We converted VCF files to MAF format using vcf2maf (v.1.6.22) for easier data handling, and VEP (v.112) was used for reannotation. Discordant grouping was used to identify structural variation breakpoints using BRASS95 (v.5.3.3-ww10 on C++), and copy numbers were called using geneCN (v.2.1 on Perl and on R). A total of 653 index, 123 synchronous and 34 metachronous colonic adenomas were analysed on the basis of the presence of one or two APC driver (truncating) mutations, KRAS driver mutation status and location (right, left, rectum). VAFs above 0.5 for the highest-ranked APC driver mutations were halved, assuming LOH.
Data analysis and visualization
Statistical analyses were performed in RStudio96 mostly with R v.4.3.2. χ2 statistical tests were used to compare observed and expected counts extracted from signature data and decay curves. Two-sided Fisher’s exact tests or two-sample tests of equality of proportions with continuity correction were used to compare mutation proportions. Two-sided t-tests were used to compare tumour mutation burden between two groups and ANOVA with Welch’s correction for 3+ group comparison. log-rank tests were for mouse survival analysis. For data visualization, oncoprint (waterfall), copy-number frequency and copy-number spectral plots were designed using the GenVisR97 package (v.1.34.0), while all other plots were designed using the ggplot2 (ref. 98) package (v.3.5.1). Apc lollipop plots were designed using MutationMapper99,100 and further adapted. All figures were assembled using Inkscape software. Immunohistochemistry and RNAscope image analysis and quantification were performed using QuPath101.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

