Patient enrolment and tissue acquisition
All experimental procedures followed approved guidelines and were approved by the Institutional Review Boards (IRBs) at WCM. Recruited patients signed informed consent under IRB-approved protocols: WCM/New York-Presbyterian (NYP) IRB protocols for Tumour Biobankingâ0201005295; GU Tumour Biobankingâ1008011210; Urothelial Cancer Sequencingâ1011011386; and Comprehensive Cancer Characterization (Genomic and Transcriptomic Profilingâ1007011157, and Precision Medicineâ1305013903). The study was open to all patients with advanced UC at WCM/NYP. Fresh-frozen and FFPE tissues from biopsies, cystectomy and nephroureterectomy specimens from patients with high-grade UC were collected. All pathology specimens were reviewed by board-certified genitourinary pathologists (J.M.M.) at WCM/NYP. Clinical charts were reviewed to record patient demographics, tobacco use, family history of cancer, concurrent cancer, treatment history, anatomical site, pathologic grade and stage using the tumour, node, metastasis (TNM) system.
Rapid autopsy procedures
The Englander Institute for Precision Medicine research protocol at WCM/NYP has been established to promote personalized medicine focused on molecular diagnostics and therapeutics. Patients were given the option to be enrolled in the IRB-approved rapid autopsy programme. In addition, patientsâ next-of-kin provided written consent before autopsy. A systematic autopsy protocol is followed whereby normal and malignant fresh tissues are collected, allocating samples to be snap-frozen or formalin-fixed. The goal is to maximize the amount of tissue collected for research purposes. Once tissue collection is complete, the autopsy proceeds per the WCM Autopsy Service protocol. In this study, tissues from multiple sites were collected, with DNA extracted for WGS following haematoxylin and eosin evaluation and frozen slide annotation.
Whole-genome library preparation and sequencing
WGS libraries were prepared using a KAPA Hyper PCR+ library preparation kit following the manufacturerâs instructions. Before library preparation, FFPE samples were repaired using PreCR Repair Mix (NEB) followed by additional DNA quantification. For library preparation, DNA was sheared using a Covaris LE220. DNA fragments were end-repaired, adenylated and ligated to Illumina sequencing adapters. Libraries went through two post-ligation bead clean-ups, PCR amplification and a final post-PCR bead clean-up. Final library quality was verified using a KAPA qPCR Library Quantification kit (Roche) and a Fragment Analyzer (Agilent). Libraries were normalized, pooled and sequenced on an Illumina NovaSeq 6000 sequencer using 2âÃâ150âbp cycles.
WGS data pre-processing, variant calling and annotation
The New York Genome Center (v.6) somatic pipeline55 was used to align the data and call variants. In brief, sequencing reads were aligned to GRCh38 with BWA-MEM (v.0.7.15)56. Short alignments were removed with NYGC ShortAlignmentMarking (v.2.1) (https://github.com/nygenome/nygc-short-alignment-marking), and mate-pair information was added with GATK FixMateInformation (v.4.1.0)57. Individual lane BAMs were merged and sorted simultaneously with Novosort markDuplicates (v.1.03.01), followed by GATK BQSR. SNVs, multi-nucleotide variants and indels were called using MuTect2 (GATK v.4.0.5.1)58, Strelka2 (v.2.9.3)59 and Lancet (v.1.0.7)60. SVs were called using Svaba (v.0.2.1)61, Manta (v.1.4.0)62 and Lumpy (v.0.2.13)63, with Svaba used for both indels and SVs. Split-read support for SVs was quantified using SplazerS (v.1.1)64. Germline variants were called on the matched normal samples with GATK HaplotypeCaller (v.3.5) and filtered with GATK VQSR at tranche 99.6%. The positions of heterozygous germline variants were used to compute Bâallele frequencies in the tumour samples. Variants were merged across callers and annotated using Ensembl (v.93)65, COSMIC (v.86)66, 1000Genomes (Phase3)67, ClinVar (201706)68, PolyPhen (v.2.2.2)69, SIFT (v.5.2.2)70, FATHMM (v.2.1)71, gnomAD (r.2.0.1)72 and dbSNP (v.150)73 using Variant Effect Predictor (v.93.2)74. Somatic variants that occurred in two or more individuals in our in-house panel of normals were removed, as well as SNV/indels that had minor allele frequency â¥1% in 1000Genomes or gnomAD, and SVs overlapping with DGV (2020-02-25 release)75, 1000Genomes or gnomAD-SV (v.2.0.1)76. SNV/indels with tumour VAF <0.0001, normal VAF >0.2, depth <2 in either the tumour or normal sample, and normal VAF greater than tumour VAF were filtered from the final callset. SNV/indels with support from two or more callers were marked as high confidence. SVs with support from two or more callers or one caller with split-read or CNV changepoint support were marked as high confidence. Variants detected by the somatic calling pipeline in the FFPE normal tissues were removed from the tumour samples. None of the 77 remaining tumour samples was excluded from the analyses due to the absence of ABOPEC mutational signatures.
Purity and ploidy estimation
Purity and ploidy were estimated for each sample using AscatNGS (v.4.2.1)77 and Sequenza (v.3.0.0)78. Estimates were manually reviewed and chosen based on VAF, B-allele frequency and read depth fit. Owing to low purity, six tumours were excluded from downstream JaBbA analysis.
Study sample size definition
Our study included 83 samples: 77 histologically proven UCs and 5 morphologically normal urothelial samples from 50 patients. Five normal urothelium samples were only included in the comparisons between primary and metastatic tumours and normal urothelial samples. The bladder cancer (primary bladder adenocarcinoma) sample from patient WCMIVG101 was only used for ONT long-read WGS. This was a cross-sectional genomic cohort study of patients with advanced UC. Basic covariates (including sex, age at diagnosis, smoking status, highest stage, primary or metastatic tumour, site of tumour collection and platinum-based chemotherapy status) are listed in Supplementary Table 1.
Union of somatic variants for patients with multiple samples
For patients with multiple samples, a union of âHighConfidenceâ somatic SNVs and indels across all the patientâs samples was generated. Pileup (v.0.15.0) (https://github.com/pysam-developers/pysam) was then run on tumour and normal BAM files to compute the counts for variants present in the union variant call format (VCF) file that were missing from each sampleâs VCF. Variants that had a VAFâ>â0 were then rescued and added to the sample VCF. The resulting âunion VCFâ was used for further post-processing.
Mutational signature fitting and assignment
Mutational signature fitting was performed using the Râpackage deconstructSigs (v.1.9)79. HighConfidence variants were input, and COSMIC (v.3.2) signatures (https://cancer.sanger.ac.uk/signatures/)66 was used as a reference. Arguments âcontexts.needed=TRUEâ and âsignature.cutoff=0â were used for SBS, DBS and ID signatures. Additionally, the experimentally derived SBS.A3G from our APOBEC3G transgenic mouse model7 was included to fit the signatures in patientsâ tumours. Following signature fitting, for each SNV, we computed a channel of 78 posterior probabilities (corresponding to each of the SBS reference signatures) that the mutation was caused by a given COSMIC signature (similar to ref. 80). After discarding posterior probabilities below 0.5, the most likely signature was chosen. The assignment was discarded if the trinucleotide context was not one of the top 5 reference signature peaks. APOBEC (SBS2 and SBS13) and platinum chemotherapy (SBS31 and SBS35) were treated separately based on previous knowledge (SBS2: C>T; SBS13: C>A, C>G; SBS31: C>T, T>A; SBS35: C>A, C>G, C>T, T>A). We then refined platinum chemotherapy assignments using patient clinical data. For samples from patients who did not receive platinum chemotherapy, SBS31 and SBS35 mutation assignments were removed. For patients with multiple samples, SBS31 and SBS35 mutation assignments were removed for mutations occurring in both pre-chemotherapy and post-chemotherapy samples. However, if a mutation was unassigned in one post-chemotherapy sample but was assigned to SBS31 or SBS35 in another post-chemotherapy sample from the same patient, the mutation in the first sample was reassigned.
Mutational signature assignment in matched urothelium samples
Mutational signature assignments of variants in the normal urothelium were determined using matched tumours from the same patient. Pileup (v.0.15.0) (https://github.com/pysam-developers/pysam) was run on the normal urothelium, using positions with a SBS2 or SBS13 assignment in the matched tumours. Variants with VAFâ>â0 were counted towards the number of variants assigned to a given mutational signature in the normal urothelium.
Signature clonality fold change analysis
To investigate the timing of ageing (SBS1), APOBEC (SBS2 and SBS13) and chemotherapy (SBS31 and SBS35) mutational processes, we computed a signature clonality fold change as previously described8. To compute the clonal versus subclonal fold change for each post-chemotherapy sample, we pooled all MutationTimer12 clonal categories (early clonal, late clonal, clonal (NA), subclonal) and divided the proportion of clonal mutations by the proportion of subclonal mutations assigned to the same signature. The early versus late fold change was similarly calculated. Mutation counts were pooled for their respective COSMIC signatures. Only samples with sufficient purity for upstream JaBbA analysis were included.
For Fig. 1b, we included nâ=â12 patients with multiple samples who received platinum chemotherapy. Truncal mutations were node alterations shared by all samples in a tree, whereas leaf mutations were node alterations unique to each sample. To statistically substantiate the observation that APOBEC signatures dominated early truncal nodes whereas chemotherapy-associated signatures dominated later leaf nodes in phylogenetic trees, we calculated the fold change of their variants for each patient by dividing the fraction of variants associated with a mutational signature in leaf nodes by the fraction in truncal nodes for each patient. To avoid dividing by zero, a pseudo-count of 1 variant was added before the fold change calculation.
Signature clonality fold change analysis of chromatin-modifying genes
The list of chromatin-modifying genes was curated from REACTOME81 and from previous publications14,82,83. Duplicate genes were removed. The CGC gene list is described in the section âOncoprint of chemotherapy-induced mutationsâ. Notably, among the 743 CGC genes examined, 47 genes overlapped with the 259 chromatin-modifying genes, aligning with the frequent observations of mutations in chromatin-remodelling genes in cancer84. Control genes were randomly sampled from 500 genes not in the list of CGC genes and chromatin-modifying genes (Supplementary Table 5). The Ensembl variant effect predictor (VEP)74 was used to assign functional impact predictions to the detected variants, whereas the Râpackage MutationTimeR (v.1.00.2)12 was used to compute the clonality and CCF for each variant (see the section âMutation timing and CCF calculationâ). Moderate-impact and high-impact variants predicted by the VEP were analysed to prioritize early urothelial mutations with functional consequences.
Estimating the relative velocity fold change of mutagenic processes
Mutagenic velocity (rate of signature accumulation per month of exposure) was calculated for ageing (SBS1), APOBEC (SBS2, SBS13 and DBS11) and platinum chemotherapy (SBS31, SBS35 and DBS5) signatures. Mutation counts from signatures fitted by deconstructSigs79 was divided by the exposure time to mutagenic processes (months). The estimated exposure time for ageing and chemotherapy was between the date of sample collection and the date of birth or the date of the initial chemotherapy treatment, respectively. For patients with an unknown initial diagnosis date, it was assumed to be 1âyear before the earliest date of tissue collection. Duration from birth and date of diagnosis were retroactively calculated from age at diagnosis and sample collection dates. For APOBEC, the exposure time is assumed to be from the sample collection dating back to 10âyears before the UC diagnosis date, as we expected that APOBEC mutagenesis occurs long before the diagnosis of UC. All samples were given one pseudo count of the ageing signature to prevent division by zero in samples with no ageing signature contribution. Samples with zero APOBEC signature contribution and post-chemotherapy samples with zero chemotherapy contribution were excluded from the calculation. The velocity induced by each mutational process within patients with multiple samples was averaged. Fold change was calculated by dividing APOBEC and chemotherapy velocity by ageing velocity.
Oncoprint of chemotherapy-induced mutations
To address the biological significance of the mutations induced by cisplatin treatment, we examined mutations in coding regions assigned to SBS31 and SBS35, which are induced by previous treatment with platinum-based chemotherapy. CGC genes were curated from Cancer Gene Consensus (CGC v.95 Tier 1 and 2 genes) and TCGA 2017 BLCA driver genes, excluding FLAGS (frequently mutated genes)85, with BRCA2, KMT2C and KMT2D rescued back into the gene list.
Association of alterations in driver genes with the presence of SV types
We investigated the association between alterations in driver genes and the presence of SV types using a predefined gene list (Supplementary Table 5). For each combination of gene and SV type, we conducted association tests using the Fisherâs test with a FDR threshold of <0.25. The tested associations included moderate to high impact SNVs/indels, amplifications, a combination of amplifications and SNVs/indels, homozygous deletions and a combination of homozygous deletions and SNVs/indels.
Platinum chemotherapy mutations in regulatory regions and recurrence FishHook analysis
We used three databases for analysis: FANTOM5 for promoters86 (209,911 promoters, 4.2âMB), FANTOM5 for enhancers87 (63,285 enhancers, 18.6âMB) and previously described super-enhancers88 (8,589 super-enhancers, 468.5âMB, with the union taken across 86 cell types). The initial step involved overlaying all SNVs onto these features, disregarding mutational signature assignment, for a comprehensive starting point in our analysis. We then conducted FishHook analysis to identify any regulatory elements that are mutated more than expected based on background. We re-used covariates from FishHook SV analysis (see the section âRegions of recurrent structural variationâ), and used the union (ignoring strand) of promoters, enhancers and super-enhancers as the list of hypotheses to test. Each patient could only contribute one mutation per hypothesis. We included local mutation density as a covariate and kept the FDR cut-off at 0.25.
SNV enrichment analysis
Enrichr refers to an integrative web-based search engine of various gene set libraries and methods to compute gene set enrichment with interactive visualization of the enrichment results89,90,91. In this study, we interrogated the NCI-Nature 2016 pathway database to search for pathways showing a significant increase in the number of mutations assigned to platinum chemotherapy-induced SBS31 and SBS35. The Pâvalue was computed from a two-sided Fisherâs exact test, and pathways with adjusted Pââ¤â0.2 are reported. The enriched pathways were visualized by Appyter available at Enrichr.
Mutation timing and CCF calculation
The Râpackage MutationTimeR (v.1.00.2)12 was run using the union of somatic variants, allele-specific CN output from JaBbA, patient sex information and previously estimated purity values. Parameter n.boot was set to 200. MutationTimeR infers a multiplicity for each mutation and assigns a timing based on the multiplicity and the allele-specific CN configuration at that locus. Using MutationTimeR multiplicities, CCF was computed as previously described92.
Mutational signature analysis of phylogenetic trees
Phylogenetic trees were generated using LICHeE (v.1.0)93 for each patient with multiple samples. LICHeE was run with a variant-by-CFF matrix as input, the âmaxVAFAbsentâ argument of 0.0, the âminVAFPresentâ argument of 0.5, the âmaxClusterDistâ of 0.2, and in cell prevalence mode (-cp). The top-scoring tree was selected. The variants in each of the resulting nodes of the phylogenetic tree output were then fed through deconstructSigs as described above to estimate a set of mutational signature proportions for each node.
Calculating the ratio of dN/dS
Genes with evidence of positive selection were detected using the Râpackage dndscv94,95,96 using default parameters and GRCh38-specific reference data as supplied by the developers. As per the developerâs instructions, mutations shared across multiple samples from the same patient were only listed once. This method of calculating the dN/dS ratio adjusts for other confounders of clonal selection, such as large gene sizes and a high regional mutation rate97. The following classification was used: dN/dSâ=â1, neutral selection; dN/dSâ>â1, positive selection; dN/dSâ<â1, negative selection. Higher missense and nonsense dN/dS ratios indicate a clonal selection of oncogenes and tumour suppressor genes, respectively.
Detection of complex structural variants with JaBbA
Read counts were corrected for GC percentage and mappability in 1-kb bins using fragCounter (https://github.com/mskilab/fragCounter) for all tumour and normal samples. A coverage panel of normal was built from the normal samples and used to denoise the tumour coverage data using dryclean (commit bda8065)98. Denoised tumour coverage profiles, Bâallele frequencies and high-confidence SVs were used as input to JaBbA (v.1.1)2, an algorithm that integrates CNV and SVs into a junction-balanced genome graph, computing integer copy numbers for both. Default parameters were used, except for the slack penalty, which was increased to 1,000. Simple inversions, translocations, duplications and deletions, as well as TICs, quasi-reciprocal pairs, rigma, pyrgo, tyfonas, BFB cycles and DMs, were called on the junction-balanced genome graph using the JaBbA companion Râpackage gGnome (commit c390d80)2. Using the integer CN as output by JaBbA, we computed the fraction of genome altered, defined here as the proportion of autosomes not in a neutral copy state, as defined by sample ploidy. For samples with an intermediate average ploidy (fractional value of 0.4â0.6, for example, 3.5), the copy-neutral state was set as the closest two integer values (for example, for a ploidy of 3.5, the copy-neutral states would be 3 and 4). Otherwise, the copy-neutral state was set as the rounded ploidy.
Regions of recurrent structural variation
A GammaâPoisson model, as implemented in the Râpackage FishHook (commit 06e3927)99, was used to discover regions of recurrent SVs. The genome was partitioned into 100-kb non-overlapping bins, and the union of breakpoints from each patient was used as input as previously described100. Regions overlapping intervals of mappability <1 and centromeres >25% were excluded from the analysis. Covariates were added to model the background mutation rate, including the following:
-
Nucleotide frequency, dinucleotide frequency and trinucleotide frequency
-
H3K4me3 marks (ENCODE accession: ENCFF191IBA), H3K27ac marks (ENCFF208GHP), H3K4me1 marks (ENCFF759BRD), H3K3me3 marks (ENCFF983DSU)
-
DNase hypersensitivity sites (ENCFF823HYK)
-
Replication timing (https://github.com/skandlab/MutSpot/tree/master/features/Ch38), fragile sites (HGNC 2021) and
-
RepeatMasker long interspersed nuclear element, short interspersed nuclear element, long terminal repeat, simple repeat and DNA transposon annotations from UCSC101.
A FDR-adjusted (BenjaminiâHochberg) Pâvalue cut-off of 0.25 was used to nominate significant breakpoint hits. Cancer-related genes were from the COSMIC Cancer Gene Consensus102.
Detection of ecDNA-forming structural variants
We noted that JaBbA does not impose any kind of cyclic constraint when calling high-level amplifications (for example, DMs, tyfonas and BFBs). To determine whether these events were ecDNA-forming SVs, AmpliconArchitect (v.1.2)17 was run using default parameters on tumour BAMs downsampled to 10x, considering only intervals with integer CNs greater than 4 (inferred by JaBbA) and longer than 10âkb. For two runs on which these parameters could not be applied, we used 10x downsample, CNâ>â5 and 100-kb intervals. Any high-level amplifications detected by JaBbA overlapped with an AmpliconArchitect-detected cyclic amplicon were nominated as ecDNA-forming SVs.
Kataegis and kyklonas identification
We ran the SigProfilerClusters (v.1.1.2)103 software with default parameters to identify kataegis loci, computing a sample-dependent inter-substitution distance of clustered mutations and requiring a kataegis event to have a consistent VAF. Kataegis events contained completely within the footprint of an ecDNA-forming SV are classified as kyklonas.
Relative timing of APOBEC and chemotherapy mutations on ecDNA-forming SVs
As a secondary method for determining the relative contributions and timing of mutational processes acting on ecDNA, we partitioned SNVs in ecDNA-forming SVs by VAF into three groups (VAFââ¤â0.333, 0.333â<âVAFââ¤â0.667 and VAFâ>â0.667) for each sample. We then ran deconstructSigs as described above and VAF combination, requiring that a particular combination have at least 50 SNVs. The relative contribution of APOBEC and chemotherapy mutageneses on ecDNA-forming SVs are presented in Fig. 2f, and all mutational processes in Extended Data Fig. 6a.
Analysis of CCND1 ecDNA amplification in TCGA/PCAWG pan-cancer cohorts
The AmpliconArchitect analyses from a previous study22 were accessed from GitHub. Samples were from both TCGA and PCAWG datasets (based on their identifiers, either SA-XXXX or TCGA-XXXX). Sample barcodes from all specimens were analysed for amplicon intervals corresponding to the genomic location of CCND1 (chromosomeâ11: 69455855â69469242). The resulting amplicon intervals were then matched to the type of amplification event (heavily rearranged, circular-BFB or linear) as determined by AmpliconArchitect. Only one CCND1 amplification event was identified in cervical cancer and none in BÂ cell lymphoma, glioblastoma, sarcoma, renal or colorectal cancer. The CDKN2A and CCND1 mutation and CN status of bladder cases having CCND1 amplicons were retrieved from cBioPortal. The study abbreviations for different cancer types are listed on the National Cancer Institute website (https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/tcga-study-abbreviations).
RNA sequencing analysis
STAR aligner (v.2.7.3a)104 was run in two-pass mode (–twoPassMode Basic105) versus GRCh38. The STAR index was created using Gencode (v.39) for splice junction annotation, with overhang=99, and including only canonical chromosomes (chromosomes 1â22, X, Y and M). Gencode (v.28) was used for gene quantification. The GTF was filtered to include only genes from the following types (âgene_typeâ field): protein_coding, lincRNA, antisense, IG_LV_gene, IG_V_gene, IG_V_pseudogene, IG_D_gene, IG_J_gene, IG_J_pseudogene, IG_C_gene, IG_C_pseudogene, TR_V_gene, TR_V_pseudogene, TR_D_gene, TR_J_gene, TR_J_pseudogene, TR_C_gene. This filtering was performed using the âmkgtfâ utility from CellRanger (https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references#mkgtf). Transcripts annotated as âretained_intronâ were also excluded, using the script collapse_annotation.py from the GTex pipeline (https://github.com/broadinstitute/gtex-pipeline/tree/master/gene_model) with argumât –collapse_only. This filtering was performed to avoid under-counting due to overlap for instances in which genes from these categories overlapped other gene types (like pseudogene versions of the same gene) or in which genes would otherwise be non-overlapping except for âretained_intronâ isoforms. The featureCounts106 module from the subread package (v.1.4.3-p1) was used for read quantification, with the argument â-s 2â for strand-specific samples and argument â-s 0â for non-strand-specific samples. The module ComBat_seq107 from the sva package (R v.4.1.3, sva v.3.42.0) was used for batch correction. The samples were treated as four batches based on prep type, with a 2âÃâ2 design for âtotalRNAâ or physical rRNA removal versus âmRNAâ or poly-A selection, and strand-specific versus non-strand-specific. The 39 samples with corresponding DNA were included. DNA and RNA were extracted from the same tumour sample within the same patient except when tumour tissues were not sufficient for both. DESeq2 (R v.3.6.1, DESeq2 v.1.24.0)108 was used for library size normalization after batch correction. DESeq2 was also used to perform differential expression between samples with versus without gain of CCND1 CN. Gain is defined as CN gain relative to ploidy.
ecDNA enrichment statistical analysis in the WCM-UC cohort
The co-occurence of CCND1 amplification and homozygous loss of CDKN2A was determined using Fisherâs exact test (fisher.test in R v.4.0.0). CCND1 amplification was defined as having a gene modal copy number (CN) greater than two times the sample ploidy or greater than 8. Homozygous loss of CDKN2A was defined as having a gene modal CN of zero. Separate Fisherâs exact tests were performed within CCND1 amplifications, splitting by ecDNA-forming SV status, and across the entire cohort, comparing CCND1 ecDNA-forming SV+ to CCND1 ecDNA-forming SVâ samples.
Overall survival analysis
To determine the impact of CCND1 alterations on clinical outcomes, we queried data from 10,953 patients (10,967 samples) across 32 cancer types included in the Pan-Cancer Atlas109 from cBioPortal (https://bit.ly/4cjAYof). Out of 10,622 patients with overall survival data available, 2,128 patients had CCND1 alterations, including CN gains and CN amplifications. We performed survival analysis using the built-in survival function of cBioPortal.
UCSC Genome browser view of shared CCND1 ecDNA-forming SV regions
The JaBbA amplicon footprints corresponding to each of the putative CCND1 ecDNA-forming SVs were intersected, which produced a single 204âkb genomic interval (chromosomeâ11: 69614476â69818943). This interval was uploaded as a custom track in UCSC Genome Browser101. Additional tracks selected included NCBI RefSeq gene models (release 2023-03-21), FANTOM5 transcription start site peaks86, and H3K27Ac, H3K4Me1 and H3K4Me3 marks from ENCODE110. A tutorial on how to operate the interactive browser interface is available at GitHub (https://github.com/mskilab-org/gGnome.js/blob/master/README.md#the-ggnomejs-interface).
Visualization of ecDNA-forming SV reconstructions
For each putative ecDNA-forming SV (for example, JaBbA-called tyfonas, DMs or BFBs overlapping an AmpliconArchitect cyclic call), the amplicon-associated subgraph was extracted from the full JaBbA genome graph. Each amplicon was then decomposed into constituent walks using the âpeelâ function in the Râpackage gGnome (commit c390d80), using junction CN to rank walks. In brief, âpeelâ attempts to iteratively decompose a subgraph by finding walks through the graph that maximize some junction feature (here, JCN). For each iteration, the top-ranked walk was âpeeledâ off of the subgraph, and the procedure was repeated.
After obtaining a set of walks for each amplicon, the non-cyclic walks and walks containing genomic intervals with CNs fewer than eight were removed. Each walk that passed the aforementioned filters was then compared against the associated AmpliconArchitect cyclic event. The best overlapping walk was then kept, and the others were discarded. Finally, walks with reciprocal overlap â¥80% were selected for visualization with CycleViz (v.0.1.5)111 (https://github.com/AmpliconSuite/CycleViz). The preset BUSHMAN was used for displaying oncogenes contained within ecDNA reconstructions.
ONT library preparation and sequencing
Samples were fragmented to 10âkb using Covaris g-TUBE (COVARIS, 520079). DNA repair and end-prep were performed using the NEBNext Companion Module for ONT (NEB, E7180S), incubating at 20â°C for 5âmin, followed by 5âmin at 65â°C. The sample was cleaned up using 1à Ampure XP beads (Beckman Coulter, A63881) and washed twice on a magnetic rack with 80% ethanol. Sequencing adaptors and ligation buffer from an Oxford Nanopore Ligation Sequencing kit (ONT, LSK1114) were ligated to DNA ends using Quick T4 DNA ligase (NEB, E6056) for 45âmin at room temperature. The sample was cleaned up using 0.45à Ampure XP beads (Beckman Coulter, A63881), washing twice on a magnetic rack with the long-fragment buffer (ONT, LSK114) before eluting in 32âµl of elution buffer (ONT, LSK114). Sequencing libraries were prepared by adding the following to the eluate: 100âµl sequencing buffer (ONT, LSK114), 68âµl loading solution (ONT, LSK114) and 0.5âµl sequencing tether (ONT, LSK114). Samples were run on a PromethION (R10.4.1) flow cell using the PromethION sequencer. Sequencing runs were operated using MinKNOW software (v.23.07.12). Live base-calling was performed in high accuracy (HAC) mode. Read N50 was 12,186âbp, and mean coverage was 58Ã.
Assembly of circular ecDNA contigs
ONT reads were aligned to GRCh38 with minimap2 (v.2.26-r1175)112 with flags -aâL -âD –cs -x map-ont, and coordinate-sorted with samtools (v.1.18)113. Reads overlapping the amplified regions of interest were extracted to FASTQ using samtools and used as input into assembly. A total of 196,924 long reads overlapped CCND1 ecDNA reference coordinates with N50 10,087 and mean coverage 1,403Ã. GRCh38 ALT contig chr11_KI270827v1_alt overlapped one of the regions of interest and had many reads aligned to it instead of the corresponding locus on chromosomeâ11, so these were included in the assembly for the relevant amplicon. Reads were assembled with Flye (v.2.9.2-b1786)114 in metagenomics mode, with the ânano-hq setting and âread-error set to 0.03 as recommended for ONT Q20+ chemistry. A single round of polishing was used. Following assembly, the Flye assembly summary was reviewed to confirm that a circular contig was assembled. The assembled contig was mapped back to GRCh38 using minimap2 with assembly alignment preset -ax asm5 for confirmation that the assembly footprint on the reference genome matched the amplicon detected using Illumina sequencing.
Cell culture
Bladder cancer cell lines UMUC3 (CRL-1749) and 5637 (HTB-6), and HEK293T (CRL-3216) were purchased from the American Type Culture Collection. The SF295 cell line was purchased from Neuromics (SF-001). Cell lines UMUC3, 5637 and SF295 were validated by STR testing and/or morphology by the manufacturer. The UMUC3, SF295, 5637 and HEK293T cells were incubated in a humidified constant 37â°C and 5% CO2 incubator in EMEM (ATCC), RPMI-1640 (Gibco) (for SF295 and 5637) and DMEM (Gibco) medium, respectively, with 100âUâmlâ1 penicillinâstreptomycin (Gibco) and 10% tetracycline-free FBS (Omega Scientific). The cells were tested negative for Mycoplasma contamination with a PCR detection kit (ABM) according to the manufacturerâs instructions.
Plasmids for episomal ecDNA and shRNA knockdown
The non-integrating lentiviral episomal vector we used in this study has homogeneous long terminal repeat regions that promote the circularization of the transduced CCND1 DNA. As this vector is integrase-deficient, it models the extrachromosomal configuration of CCND1 ecDNA. Finally, the scaffold/matrix attachment region (S/MAR) sequence, which interacts with the nuclear matrix through scaffold attachment factor A (SAF-A), ensures the self-replication and partitioning of the episome during mitosis115. The pBMN (CMV-copGFP-Puro-SMAR) plasmid, a gift from M.âEssand (Addgene, 80391)116, was modified to include a mCherry-T2A-5â²HA-tagged CCND1 coding sequence, synthesized with XbaI and AgeI sites by Integrated DNA Technologies. Following digestion with XbaI and AgeI, the fragment was cloned into the pBMN vector between the CMV promoter and Puro-SMAR sequence using T4 DNA ligase (New England BioLabs) (Supplementary Figs. 1â5). This plasmid was transformed into One Shot Stbl3 Chemically Competent Escherichia coli (Thermo Fisher Scientific) and collected using a PureLink HiPure Plasmid Filter Maxiprep kit (Thermo Fisher Scientific), according to the manufacturerâs instructions. The shRNA CCND1 knockdown plasmid (pLV-mCherry-T2A-Puro-U6-shRNA) and shRNA control plasmid (pLV-EGFP-T2A-Puro-U6-scramble control) were generated by VectorBuilder. The target CCND1 sequence was TGTTGTAAGAATAGGCATTAA. The plasmids were validated by Sanger sequencing.
Cell line generation for episomal ecDNA and shRNA knockdown
HEK293T cells were transfected with the interested plasmid in combination with pPAX2 and pVSV-G plasmids (Addgene). The virus-containing medium was collected every 24âh for 3âdays post-transfection, filtered using a 0.22-µm sterile 50âml tube top filter (Corning) and concentrated using Lenti-X Concentrator (Takara). The concentrated lentivirus was used for transduction with 10âµgâmlâ1 polybrene (Millipore Sigma). Transduced cells were selected using 2âµgâmlâ1 puromycin (Gibco). The expression level was validated by SDSâPAGE western blotting of cyclinâD1.
Competition assay tracking using the Incucyte living-image system
The GFP-tagged pBMN empty vector UMUC3 cells were co-cultured with mCherry-tagged pBMN CCND1 episomal ecDNA UMUC3 cells in a 1:1 ratio, initially seeded into a 6-well plate. Likewise, scramble shRNA SF295 cells were co-cultured with CCND1 knockdown SF295 cells in a 1:1 ratio, initially seeded into a 6-well plate. After 24âh of seeding, the co-cultured cells were treated with 1.5âµM cisplatin or vehicle control. Cisplatin (at the stock concentration of 1âmgâmlâ1) was freshly dissolved in medium instead of DMSO to prevent loss of activity117 and was protected from light. Each 6-well plate was then placed in the Incucyte system (Sartorius) for live tracking of GFP and mCherry signals every 6âh. To sustain the selective pressure of cisplatin, the medium containing cisplatin was refreshed after 48âh. For replicates, 25 spots within each well were imaged. After 96âh of monitoring, GFP and mCherry signals were quantified using Incucyte software (Incucyte, 2022B, Rev2). Subsequently, the ratio of mCherry and GFP signal area was calculated and normalized to the baseline (timeâ0) measurements.
Single-cell RNA sequencing
The competitive assay experiment was repeated in T75 flasks for single-cell RNA sequencing to identify the impact of CCDN1 episomal ecDNA on the transcriptional profiles under selective pressure. Following a 96-h exposure to either vehicle or cisplatin (1.5âµM), live cells tagged with GFP and mCherry were isolated using fluorescence-activated cell sorting (BD FACSAria II) and subsequently processed by the WCM genomic core facility. The 10x Genomics chromium single-cell 3â² library RNA sequencing kit was used to construct libraries, and sequencing was performed using Illumina NovaSeqXplus with paired-end sequencing, producing readâ1 of 28âbp and readâ2 of 90âbp.
Single-cell RNA sequencing analysis
Alignment
The obtained sequencing data underwent processing through 10x cellranger (v.7.1.0) with the â –include-intronsâ option set to false. Specifically, cellranger mkfastq was used to generate Fastq files. For the alignment, we used the 10x prebuilt GRCh38-2020-A human reference.
Preprocessing
Raw single-cell RNA sequencing results of four samples (vehicleâGFP, vehicleâmCherry, cisplatinâGFP and cisplatinâmCherry) were processed using CellRanger (v.7.1.0). Matrix files for each pair of samples (GFP and mCherry cells under the same treatment) were read with Scanpy (v.1.9.6)118 and concatenated in the same AnnData object (v.0.10.3)119. Cells were filtered to have at least 200 genes and a mitochondrial count below 20%. Genes were filtered to those present in at least three cells. Transcriptomes were normalized by their total size and scaled to 10,000 UMI counts, and highly variable genes were selected using Scanpy pp.highly_variable_genes function with batch_key=âsampleâ. The data matrix underwent logarithmic transformation using the natural logarithm using Scanpy pp.log1p function. We scaled the gene expression per gene by subtracting the mean and dividing by the standard deviation using the Scanpy pp.scale function. Then we performed principal component analysis using the Scanpy tl.pca function with svd_solverâ=ââarpackâ and performed batch correction using Harmony120.
E2F score calculation
The E2F score was calculated in each single cell by calculating the average expression of all 200 genes in the E2F target pathway from the Molecular Signatures Database (MSigDB)121 subtracted the average expression of a reference set of genes using the Scanpy tl.score_score_genes function. We constructed a k-nearest neighbour graph using the pp.neighbors function from Scanpy, setting the number of neighbours to 25 and the number of principal components to 40. We then applied UMAP for dimensionality reduction using the Scanpy tl.umap function for each anndata pair (GFP and mCherry cells under the same treatment). For visualizing E2F transcription factor activity across individual cells, we generated a UMAP colour-coded by the pre-calculated E2F score to differentiate between low and high E2F activity states. We then plotted E2F scores as boxplots and statistically compared the E2F score between each pair of samples using the MannâWhitney test.
Differential gene expression and gene set enrichment analysis
To compare the transcriptomic profiles between each pair of samples (GFP and mCherry cells under the same treatment), we performed differential expression of genes on pre-processed anndata between GFP and mCherry samples under each treatment using t-test as implemented in Scanpy tl.rank_genes_groups function. Gene enrichment analysis of significantly upregulated and downregulated genes was done using the GSEApy package (v.1.1.1) with the GSEApy prerank function. Differentially regulated pathways were identified using gene set enrichment analysis from the GSEA (Broad Institute) package based on signatures from MSigDB (MSigDB_Hallmark_2020)121. Bubble plots of enriched gene sets were generated based on NESs and associated FDR qâvalue.
Cell cycle analysis
The UMUC3 bladder cancer cells were seeded in a 6-well plate and cultured for 24âh, followed by overnight synchronization in serum-free medium. Cells were then treated with either EMEM with 10% FBS or EMEM with 10% FBS with 1.5âµM cisplatin, freshly prepared without DMSO. After 24âh, cells were collected, resuspended in 200âµl of ice-cold PBS, and dropwise fixed with 3âml of 80% ice-cold ethanol while vortexing to minimize clumping, then stored overnight at â30â°C. The fixed cells were washed with PBS to remove the ethanol and stained with a DAPI solution (1âmgâmlâ1, BD Pharmingen, diluted 1:1,000 in 0.1% Triton X-100 PBS) for 30âmin at room temperature in the dark. Stained cells were analysed on a BD LSRFortessa flow cytometer (BD Biosciences) using a 405ânm laser with a 450/50ânm emission filter. The DNA content was analysed using FlowJo (BD Biosciences), for which cell gating was based on forward and side scatter to isolate single cells, and cell cycle proportions were calculated using the DeanâJettâFox model122.
FISH analysis
CCND1 (labelled green) and centromeric chromosomeâ11 (labelled orange) probes were used (Empire Genomics). Before use, all probes were validated on metaphase spreads.
To assess extrachromosomal CCND1 localization, the bladder cancer cell line UMUC3 and the human glioblastoma cell line SF295 were first treated with colcemid (0.1âµgâmlâ1) for 1âh to obtain a metaphase preparation. A hypotonic solution was added, then cells were fixed in methanol and acetic acid (3:1). The fixed cells were placed on microscopy glass slides and dried for 1âweek at room temperature. Prepared slides were dehydrated in a series of ethanol. Probes were then added to the slides, which were denatured and hybridized, washed and then counterstained with DAPI before visualization. CCND1 was considered present in ecDNA when the green signal was observed outside the chromosome spread, and the centromeric chromosomeâ11 served as control2.
To assess gene amplification, CCND1 enumeration from interphase FISH was carried out on FFPE tissue slides (5âµm thick). CCND1 amplification was determined by a ratio of CCND1 gene signal (green) to centromere signal (orange) higher than two. A minimum of 100 nuclei per slide were scored. Analysis was carried out using a fluorescence microscope (Olympus BX51; Olympus Optical). CytoVision (v.7.3.1) software (Leica Biosystems) was used for imaging123,124.
Preparation of metaphase spreads
Cells were grown to 80% confluency in a 15-cm dish and metaphase-arrested by adding KaryoMAX colcemid (10âµlâmlâ1, Gibco) for 1â2âh. Cells were washed with PBS, trypsinized (Gibco) and centrifuged at 200g for 10âmin. We added 10âml of 0.075âM KCl preheated to 37â°C, 1âml at a time, vortexing at maximum speed in between. Afterwards, cells were incubated for 20âmin at 37â°C. Then, 5âml of ice-cold 3:1 methanol and acetic acid (kept at â20â°C) was added, 1âml at a time, followed by resuspension of the cells by flicking the tube. The sample was centrifuged at 200g for 5âmin. Addition of the fixative followed by centrifugation was repeated four times. Two drops of cells within 200âµl of methanol and acetic acid were dropped onto prewarmed slides from a height of 15âcm. Slides were incubated overnight.
SDSâPAGE western blotting
Cells were lysed with RIPA buffer with added protease inhibitor (Thermo Fisher Scientific) to produce whole protein lysate. Protein concentration was determined using a Pierce BCA assay (Thermo Fisher Scientific). The lysate was run on a 10% SDSâPAGE gel with MOPS buffer and transferred using an iBlot system (Thermo Fisher Scientific). Rabbit anti-cyclin D1 (Abcam, ab134175, 1:2,000), mouse anti-α-tubulin (Millipore, 05-829, 1:1,000), mouse anti-Rb (Cell Signaling Technology, 9309S, 1:1,000), rabbit anti-phospho-Rb (Ser807/811) (Cell Signaling Technology, 8516S, 1:2,000) and rabbit anti-p16INK4A (Cell Signaling Technologies, 80772, 1:1,000) antibodies were separately used as primary antibodies at 4â°C overnight. HRP-conjugated secondary antibodies (goat anti-rabbit, 32260, and goat anti-mouse, 32230, Invitrogen, 1:1,000) were incubated for 1âh at room temperature. Blots were developed using Luminata Forte Poly HRP substrate (Millipore) reagent, imaged on a ChemiDoc imager (Bio-Rad) and analysed using Image Lab (Bio-Rad v.6.1.0) software.
Establishing stable 5637 cell line with doxycycline-inducible APOBEC3A expression
All-in-one doxycycline-inducible lentivirus backbone included APOBEC3A cDNA with HA-tag and Venus with continuously expressing promoter was a gift from M.âD. Weitzmann125,126. The lentivirus was transduced into the 5637 bladder cancer cell line, followed by the fluorescence-activated cell sorting of Venus-positive cells. APOBEC3A expression level was validated by SDSâPAGE western blotting using a HA-tag antibody.
Immunofluorescence microscopy
Cells were plated on Chamber slides (ThermoFisher Scientific) and treated with vehicle (DMSO) or doxycycline for 24âh. Cells were fixed in 4% paraformaldehyde at room temperature for 10âmin. The fixed nuclei were treated with 0.5% Triton X-100 in PBS and blocked by 2.5% normal goat serum in PBS. Mouse anti-phospho-histone H2A.X (Ser139) antibody (Cell Signaling Technology, 80312, 1:1,000) and rabbit anti-phospho RPA32 (S4/S8) antibody (Bethyl Fortis Life Sciences, A300-245A, 1:1,000) were used as the primary antibodies and incubated at 4â°C overnight. Then, AlexaFluor 488-conjugated AffiniPure donkey anti-rabbit antibody (Jackson ImmunoResearch Laboratories, 711-545-152, 1:1,000) and AlexaFluor 594-conjugated AffiniPure donkey anti-mouse antibody (Jackson ImmunoResearch Laboratories, 715-585-150, 1:1,000) were used as secondary antibodies and incubated for 2âh at room temperature. Negative blank controls were included by omitting primary antibodies in the protocol. After that, the slide was sealed with ProLong Diamond Antifade mountant with DAPI (Invitrogen, p36962). High-resolution imaging was performed using a fluorescent microscope (Axioscop 2 M2 with Plan Apochromat 633/1.4 NA oil differential interference contrast objective; Carl Zeiss) with a camera (CoolSNAP HQ; Photometrics) using the AF488 filter (38 HE GFP) and AF594 filter on the Axio Imager M2. Multiple fields of view for each condition were randomly picked for imaging. Z stacks (0.25âµm) were collected and subjected to constrained iterative deconvolution using Zeiss deconvolution software (Zen desk v.3.7). Images were processed using Fiji ImageJ (v.154f) software127. Each image was stacked using the Z-stacked function of the software with the âMax intensityâ property. The number of nuclei in each image was manually counted in the blue DAPI channel, discarding cells with areas less than 80% present in the image. Cells were considered pRPA-positive if they formed â¥5 discrete foci or γH2AX-positive if they formed â¥5 discrete foci within nuclei128. MannâWhitney test was used to statistically compare percentages of positive nuclei between the two treatments using GraphPad Prism (v.10.2.0).
Statistical tests
Two-sided Wilcoxon rank-sum test, two-sided t-test or Pearson test for continuous variables was performed using R (v.4.0.0) software. Unless specified otherwise, Pâ<â0.05 was considered significant.
Ethical approval
The studies involving human participants were reviewed and approved by the WCMÂ IRB. Patients and participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.