Mapping convergent regulators of melanoma drug resistance by PerturbFate

Cell culture

A375, HEK293T, Sk-Mel-3 and Sk-Mel-24 cell lines were obtained from the American Type Culture Collection. A375 and HEK293T cells were maintained in high-glucose Dulbecco’s modified Eagle’s medium supplemented with L-glutamine, sodium pyruvate (Gibco; 11995065) and 10% fetal bovine serum (FBS) (Sigma-Aldrich; F4135). Sk-Mel-3 cells were maintained in McCoy’s 5A medium (Gibco; 16600082) supplemented with 10% FBS, and Sk-Mel-24 cells were maintained in Eagle’s minimum essential medium (American Type Culture Collection; 30-2003) supplemented with 15% FBS. All cells were cultured at 37 °C with 5% CO₂.

Cell strain construction

To preserve the potential clonal diversity of cancer cells, we adopted a simplified engineering strategy for the A375 cell line, rather than transducing Tet-On dCas9–KRAB–MeCP2 followed by monoclonal selection, as in our previous study¹⁵. Inspired by the Genome-Wide Perturb-Seq study⁸, we selected dCas9–BFP–KRAB as the CRISPRi effector. The A375–dCas9–BFP–KRAB cell strain was generated by transducing A375 cells with lentivirus produced from pHR–SFFV–dCas9–BFP (Addgene; 46910) at MOI of 0.2. BFP⁺ cells within the top 15% of fluorescence intensity were sorted and expanded. Enrichment sorting was performed regularly to maintain Cas9 expression levels in the cell strain.

Plasmid and lentivirus construction

Single sgRNA expression construct cloning and lentivirus packaging were performed as in our previous study¹⁵. In brief, the CROPseqOPTI vector with an additional EGFP marker (CROPseqOPTI–GFP) was digested with BsmBI (New England Biolabs; ER0451) to remove the filler sequence between the U6 promoter and the sgRNA scaffold. Pre-annealed phosphorylated double-stranded DNA with an sgRNA sequence and complementary sticky ends was ligated into the digested backbone. Lentivirus was packaged by co-transfecting the sgRNA-expressing plasmid, psPAX2 plasmid and pMD2.G plasmid into HEK293T cells at high confluency. After washing the cells and changing the medium, virus-containing supernatants were collected 24 h later. The functional titre of the viruses was measured by assessing the proportion of fluorescent-marker-positive cells after transducing target cell lines with varying amounts of lentivirus.

A two-step cloning strategy was adopted for the construction of our dual-sgRNA drug resistance library. The oligo library was designed with the following structure: BsmBI digestion site–sgRNA1 sequence–PaqCI digestion site–filler–PaqCI digestion site–sgRNA2 sequence–BsmBI digestion site. The single-stranded library (Integrated DNA Technologies) was amplified into double-stranded DNA using primers complementary to both ends and then digested with BsmBI. The amplified library with sticky ends was ligated into the BsmBI-digested CROPseqOPTI–GFP backbone using Blunt/TA Ligase Master Mix (New England Biolabs; M0367S). The ligation product was purified with 0.75× AMPure beads (Beckman Coulter; A63881) and transformed into competent cells through electroporation. After recovery, cells were spread onto a bioassay agarose plate and incubated at 32 °C for 14 h. A separate agarose plate at a 1:100,000 dilution was inoculated to determine sgRNA coverage. All colonies were scrapped, and a Midiprep was performed to obtain the intermediate plasmid library.

The intermediate plasmid library and a gBlock (Integrated DNA Technologies) with the structure ‘PaqCI digestion site–sgRNA scaffold CR3 with a 10× capture sequence–H1 promoter–PaqCI digestion site’ were digested with PaqCI (New England Biolabs; R0745S). Ligation, transformation and plasmid preparation were performed as described above.

The full structure of the sgRNA expression cassette in the final library is U6 promoter–sgRNA1 sequence–sgRNA scaffold CR3 with a 10× capture sequence–H1 promoter–sgRNA2 sequence–sgRNA scaffold. The accuracy of plasmid library construction and the abundance of each sgRNA pair were evaluated using amplicon sequencing.

Dual-sgRNA-expressing CROPseqOPTI–mCherry plasmids were constructed for specific gene knockdown experiments. In brief, gBlocks with the structure ‘homology arm to the U6 promoter sequence–sgRNA1 sequence–sgRNA scaffold CR3 with a 10× capture sequence–H1 promoter–sgRNA2 sequence–homology arm to the second sgRNA scaffold’ were mixed with the BsmBI-digested CROPseqOPTI–mCherry backbone for Gibson assembly. After incubation at 50 °C for 30 min, transformation and plasmid preparation were carried out as described in our previous study¹⁵.

Gene selection and sgRNA evaluation

To identify potential gene candidates in which loss of function might confer resistance to Vem in melanoma cells, we leveraged previously published bulk CRISPR screens with similar experimental settings^18,19,20,52, selecting genes with sgRNAs significantly enriched (FDR < 0.1) following Vem treatment on the basis of the statistical significance reported in each study. This yielded 183 genes, which were further filtered by endogenous expression levels; 143 genes with expression of 1 log₂ CPM or higher in our in-house A375 EasySci-RNA data were retained.

We referred to the sgRNA library from Genome-Wide Perturb-Seq⁸, which also used a paired sgRNA expression system, although with a different chemistry. We further examined the sgRNAs and retained only those meeting the following criteria: (1) unique STAR mapping to the reference genome (GRCh38.p13; GENCODE); and (2) knockdown fold change of less than 0.5 (the fold change of target gene expression between cells with a target sgRNA and cells without a target sgRNA), as reflected in the Genome-Wide Perturb-Seq gene-expression data. For target genes not found in the Genome-Wide Perturb-Seq or in cases in which sgRNA inefficiency was observed in the original dataset, we used newly designed paired sgRNA libraries from a previous study²¹.

CRISPR screen experimental procedure

Two CRISPR screen replicates were performed in parallel, with viral transduction, cell passaging and sample processing carried out independently for each. In each replicate, five million A375–dCas9–BFP–KRAB cells were seeded into a 10-cm dish. The next day, the drug-resistance lentivirus library was added to the dish at an MOI of 0.05 with 8 ng ml⁻¹ of Polybrene (Sigma-Aldrich; TR-1003). GFP⁺ BFP⁺ cells were sorted and reseeded into a 6-cm dish for expansion. After 48 h, the medium was replaced with fresh Dulbecco’s modified Eagle’s medium containing either 0.1% DMSO (VWR; IC0219605525) or 1 μM Vem in DMSO (Selleck; S1267) to initiate the screen (day 0). The culture medium was refreshed every other day.

On day 6, for the PerturbFate experiment, 2.5 ml of the original medium was removed from the dish, the remaining medium was aspirated and the removed medium was mixed with 2.5 μl of 200 mM 5-EU (Vector Laboratories; CCT-1261). The mixture was then added back to the dish for a 1-h incubation at 37 °C. Cells were subsequently trypsinized and collected for single-cell library preparation. To minimize the impact of experimental handling on nascent transcriptomes, cells were processed on ice immediately after collection and fixed promptly. Detailed step-by-step PerturbFate protocols are provided in the Supplementary Information.

For the bulk screen, cells were trypsinized and collected on days 6 and 14. Cell pellets were stored at −20 °C until further processing. Genomic DNA was extracted from each sample (ZYMO; D3024) and split equally into several PCRs, with a maximum of 1 μg of DNA per reaction for sgRNA expression cassette amplification. After 20 cycles of amplification, the PCR products were purified (ZYMO; D4014), and the target band was extracted from an agarose gel (ZYMO; D4007) for amplicon sequencing.

Validation secondary screens were performed similarly to the method stated above. When additional genetic perturbations were involved, A375 cell lines with the target gene knocked down were established before the screen. Specific target sgRNAs were transduced using CROPseqOPTI–mCherry to prevent fluorescent conflicts with the PerturbFate sgRNA library, which expresses GFP. Additional chemicals included SIS3 (Sigma-Aldrich; 566405-1MG), SR18662 (MCE; HY-136530) and lenvatinib (MCE; L125518-5MG). Chemicals were reconstituted in DMSO and added to the culture medium at a 1:1,000 dilution ratio in combination with Vem. In the supplementary screen involving additional antibody treatments, 50 μg ml⁻¹ of human immunoglobulin G isotype (MCE; HY-P99001) or VGX-100 (MCE; HY-P990675) was added to the medium together with Vem, and the medium was changed every other day.

Sequencing data processing

For the bulk CRISPR screens, BCL files were demultiplexed into FASTQ files on the basis of index 7 barcodes. Reads from each sample were further extracted using index 5 barcodes. Each read pair was matched against two constant sequences (Read1, 11–25 nt; Read2, 11–25 nt) to retain reads with the correct library structure, allowing for a maximum of one mismatch. Then, sgRNA sequences were extracted from filtered read pairs (sgRNA sequence 1, 25–44 nt of Read1; sgRNA sequence 2, 84–103 nt of Read2), and sgRNA identities were assigned on the basis of the reference sgRNA sequences, permitting no mismatches. The sgRNA count matrices were constructed using Python 3.8.

For PerturbFate (whole transcriptome + chromatin accessibility + sgRNA), BCL files were demultiplexed into FASTQ files using P7 PCR barcodes. Each FASTQ file contained all RNA and ATAC reads from the same indexing PCR well. During barcode extraction, reads from the oligo-dT/randomN RNA and ATAC modalities were split into separate files on the basis of modality-specific adapter sequence matching. Read1 matching the ‘TG’ sequence at the 41–42 nt position was classified as transcriptome data. Oligo-dT-originated RNA reads required at least ten consecutive T bases at positions 51–60 nt, whereas randomN-originated RNA reads did not exhibit this pattern. ATAC read pairs were identified by the Tn5 ME sequence at the 41–50 nt position of Read1.

For the RNA reads, cell barcodes and UMI sequences from Read1 were appended to the headers of each Read2 FASTQ file, and only Read2 was used for subsequent steps. Poly(A) sequences, adapter sequences and low-quality bases were trimmed from Read2 of oligo-dT reads, whereas 14 additional bases (potential priming hexamer and UMI sequences) were removed from randomN reads using Trim_Galore (v.0.6.7). After this step, oligo-dT and randomN reads were combined for further processing. STAR (v.2.7.9a) was used to map reads to the hg38 reference genome (GRCh38.p13; GENCODE). Reads with a mapping score of 30 or higher were selected using SAMtools (v.1.13). Single-cell deduplication was performed on the basis of UMI sequences and alignment locations, and retained reads were split into SAM files per cell. Finally, single-cell gene-to-cell raw count matrices were constructed by assigning reads to genes.

For the ATAC reads, cell barcodes from RNA Read1 were appended to the headers of both ATAC Read1 and Read2 FASTQ files. Adapter sequences were removed in paired-end mode using Trim_Galore (v.0.6.7). STAR (v.2.7.9a) was used to map read pairs to the hg38 reference genome (GRCh38.p13; GENCODE). Picard (v.2.27.4) was used to remove PCR duplicates, and TSV files containing genomic locations and full cell barcodes were generated for SnapATAC2 (v.2.6.1)⁵³.

The sgRNA reads were matched against constant sequences in Read1 and Read2, allowing a maximum of one mismatch. For each filtered read pair, the cell barcode, sgRNA sequence and UMI were extracted from designed positions. Extracted sgRNA sequences with at most one mismatch from the sgRNA library were corrected. UMI sequences were used for deduplication by collapsing identical UMIs for each individual sgRNA under a unique cell barcode. Cells with sgRNA UMI counts exceeding five were retained, and the sgRNA × cell count matrix was constructed.

For PerturbFate (nascent transcriptome + pre-existing transcriptome + chromatin accessibility + sgRNA), most computational steps were the same as described above, with minor modifications. In this protocol, nascent and pre-existing transcriptomes from the same cell were amplified separately using distinct P7 PCR barcodes and processed as separate cells (two groups of reads with distinct cell barcodes). These were matched after single-cell gene-to-cell raw count matrices were generated. Additionally, because ATAC DNA underwent secondary tagmentation, the pipeline was modified to process single-end Read1 of ATAC fragments for chromatin accessibility quantification. The sgRNA library from the same cell was amplified from both nascent and pre-existing transcriptome PCR wells but shared a unique cell barcode for matching, achieved by adding the same barcoded sgRNA inner P7–U6 primer to paired nascent and pre-existing transcriptome PCR wells.

ATAC peaks were called for each treatment condition with a q-value cutoff of 0.01 using MACS3 (v.3.0.0b3)⁵⁴ embedded in SnapATAC2 (v.2.6.1). Peaks were resized to 500 bp centred at the summit and iteratively merged into a union peak set, resulting in 758,365 peaks in the final set. A peak-to-cell raw count matrix was generated using SnapATAC2 and converted into a sparse matrix in R for downstream analyses. Normalized ATAC signal coverage bigWig files for genome track visualization were generated using deepTools (v.3.5.1)⁵⁵.

Measurement of growth phenotypes in bulk CRISPR screens

Raw sgRNA counts from the plasmid library, as well as from cells in the DMSO and Vem conditions, were normalized by the total counts to account for sequencing depth. Perturbations relative to NTC counts were calculated by dividing the depth-normalized sgRNA counts by the sum of the depth-normalized NTC counts. Relative counts from replicates were then averaged for growth phenotype calculations. To assess whether perturbed cells outgrew NTC cells under the Vem condition, fold changes in relative counts between the Vem library and the plasmid library were calculated and log₂-transformed (Vem log₂[sgRNA enrichment]). The Vem log₂[sgRNA enrichment] value indicates whether perturbed cells exhibit a growth advantage over NTC cells in response to Vem treatment. To further compare changes in cell growth under Vem treatment, the ratio of fold changes in relative counts between the Vem library and the DMSO library was calculated and log₂-transformed (Vem − DMSO log₂[sgRNA enrichment]). The Vem − DMSO log₂[sgRNA enrichment] value reflects whether perturbed cells gained a greater growth advantage in the Vem condition compared with the DMSO condition.

Identification of sgRNA singlets and examination of knockdown efficiency

We identified sgRNA singlets using two additional cutoffs: a max-to-total proportion of 50% or higher and a sec-to-max ratio of 0.3 or lower. The max-to-total proportion represents the percentage of UMI counts for the most abundant sgRNA relative to the total sgRNA UMI counts from a single cell. The sec-to-max ratio represents the ratio of UMI counts for the second most abundant sgRNA to the UMI counts for the dominant sgRNA in a single cell. We benchmarked our method against a recent Python package that implements several statistical inference methods for single-cell sgRNA identities⁵⁶ and observed extremely high consistency. We retained only knockdown populations with more than 50 cells in each treatment condition and those in which the pseudobulk expression level of the target gene was less than 60% of that observed in NTC cells. Notably, low knockdown efficiency for certain genes probably reflects suboptimal sgRNA selection, consistent with observations from our previous study¹⁵.

Differential gene expression and differentially accessible peak analyses

Whole-transcriptome differential expression analysis was performed as in our previous study^14,15,57. For each knockdown-to-NTC differential gene expression (DEG) test, cells with fewer than 100 detected genes and features present in fewer than 25 cells were filtered out. The analysis was conducted using Monocle 2 (v.2.34.0)⁵⁸. The differentialGeneTest() function was implemented with the number of genes detected per cell as a covariate. Fold change was calculated by dividing the average normalized single-cell gene-expression CPM in knockdown cells by that in NTC cells. Differentially accessible peaks (DAPs) were identified using the diff_test() function in SnapATAC2 with default parameters.

UMAP embedding on pseudobulk perturbations

Pseudobulk knockdown population whole-transcriptome matrices and ATAC peak count matrices were constructed by aggregating all cells with the same sgRNA identity within each treatment condition. Seurat (4.2.0; ref. ⁵⁹) was used for normalization, scaling, PCA dimensionality reduction and UMAP visualization. Single-cell level MAST differential expression analysis of knockdown cells versus NTC cells was performed to obtain gene features for dimensionality reduction. Perturbations with at least one significant DEG were retained, and the top 20 DEGs with the highest absolute fold changes from all tests were combined. The top 15 principal components were used for UMAP. For ATAC pseudobulk dimensionality reduction, 300,000 highly variable peaks were selected, as implemented in SnapATAC2. The top 30 principal components were used as input for UMAP.

ATAC peaks–nascent gene-expression linkage identification

The FigR (v.1.0.1) package⁶⁰ in R was used for the analysis. To account for the potential effects of genetic and chemical perturbations on global transcriptional activity, we used a specialized normalization strategy to adjust for sequencing depth. Although NTC cells in the DMSO condition were normalized by counts per 10,000, other perturbed cells were normalized using a scaling factor of 10,000 × (mean UMIs in this population/mean UMIs in NTC cells in the DMSO condition). Given that gene regulation patterns of non-coding genes remain largely understudied, we focused exclusively on protein-coding genes. Only protein-coding genes annotated in the hg38 GTF file (GRCh38.p13; GENCODE) were retained. The ATAC peak-to-cell raw count matrix was filtered to include only peaks detected in at least ten cells. The log-transformed, normalized nascent gene-expression matrix and the filtered ATAC peak-to-cell raw count matrix were used as input for the association test. Gene–peak linkages with P values (pvalZ) ≤ 0.05, as specified in the FigR manual, were retained as significant linkages.

Cell trajectory and nascent-transcriptome-based RNA velocity inference

Because whole transcriptomes of single cells remain the most robust single-cell modality, they were used for diffusion map construction and cell order (pseudotime) inference using the destiny package (v.3.20.0) in R⁶¹. To highlight the impact of genetic and chemical perturbations on the melanoma phenotype switch, well-established gene sets defining several melanoma phenotypic states²² were used as features. Genes expressed in at least 100 cells in each condition were retained for dimensionality reduction. The top 20 principal components were used to construct the diffusion map with the DiffusionMap() function, and cell order was subsequently calculated using the DPT() function. Phenotype gene set scores in single cells were calculated by averaging the scaled expression values of constituent genes, and cells were annotated according to the highest gene program score in each bin along the trajectory. To improve clarity and ensure the statistical robustness of downstream analyses, transient plastic phenotype labels were excluded when annotating cells in the DMSO condition. Instead, cells were categorized as one of ‘melanocytic’, ‘neural crest-like’ or ‘undifferentiated’. For Vem-treated cells, no phenotype labels were assigned; instead, we focused on their transitions along the trajectory, reasoning that their transcriptomic states underwent complex and subtle changes after treatment, making them unsuitable for categorization into untreated phenotypes.

To integrate single-cell nascent transcriptomes into the diffusion map, we assumed that cell-state transitions were not completed within 1 h. This assumption aligns with the observation that RNA half-lives in human cells are typically much longer than 1 h (ref. ⁶²). Whole-transcriptome gene count matrices of cells were concatenated with pre-existing transcriptome gene count matrices of cells (which had matched single-cell nascent transcriptomes), and a joint diffusion map was constructed. Manual curation revealed that DC2 and DC3 captured the phenotypic state of cells across both chemistries. We constructed a k-nearest-neighbour model using DC2 and DC3 values with the FNN package (v.1.1.4.1) in R⁶³. The inferred cell order value for each pre-existing transcriptome cell was assigned by averaging the real cell order values of its three nearest-neighbour whole-transcriptome cells. This approach demonstrated highly consistent sgRNA abundance distributions (Extended Data Fig. 6c).

The scVelo (v.0.3.2) package⁶⁴ in Python was used to infer RNA velocity from nascent transcriptomes. To match nascent-transcriptome to whole-transcriptome cells, we performed integration as described above. Each whole-transcriptome cell was matched to a nascent transcriptome by identifying the two nearest pre-existing transcriptome neighbours and averaging their log-transformed normalized nascent transcriptomes. Normalized whole-transcriptome and nascent-transcriptome gene-expression matrices, diffusion map coordinates of whole-transcriptome cells and principal components computed using the destiny package in R were imported into the adata object. Instead of using the unspliced-to-spliced ratio, we used the new-to-total ratio of genes⁶⁵ to compute velocities. The top 3,000 genes were selected, and the top 20 principal components were used to compute moments and connectivity. Velocities were generated in stochastic mode.

Significance examination of perturbed cells in the trajectory

The ks.test() function in R was used to identify perturbed cells that exhibited significant shifts along the trajectory compared with NTC cells in both conditions. Cell order values for each perturbation were compared with those of NTC cells in the corresponding treatment condition, and differences in mean cell order values between perturbations and NTC cells were used to determine the direction and extent of the shift. The computed two-sided P values were adjusted using FDR correction. For most analyses of Vem resistance, we retained only the perturbations in the Vem condition that showed consistent growth and molecular phenotypes (shift trends along the trajectory).

To compare the relative transition distance between NTC and other perturbations upon Vem treatment, the wilcox.test() function in R was used in a similar manner. For both DMSO and Vem treatment conditions, single-cell order values of perturbed cells were adjusted relative to the averaged cell order value of the NTC cell population, and the difference between these two distributions was examined using the wilcox.test() function followed by FDR correction.

Identification of DEGs, differentially accessible motifs and DAPs along the cell trajectory

The tradeSeq (v.1.20.0) package⁶⁶ in R was used to infer statistically significant DEGs, differentially accessible motifs and DAPs along the melanoma phenotype switch trajectory. For DEG identification in whole-transcriptome and nascent-transcriptome datasets, raw gene count matrices and cell order values inferred by the destiny package in R (for whole-transcriptome cells) or by the integration mentioned above (for nascent-transcriptome cells) were used as input to the fitGAM() function. Lineage weights for all cells were set to 1, as only a single trajectory was analysed. The statistical significance of associations between expression levels and cell order was calculated using the associationTest() function. Each trajectory was divided into 40 bins on the basis of cell order values. Genes with an FDR < 0.01 in both whole-transcriptome and nascent-transcriptome analyses were considered for GRN construction. Genes with an FDR < 0.01 and an absolute log₂ fold change > 1 were selected for heat-map visualization in Fig. 2f. Motif counts in single cells were obtained using the motifmatchr package (v.1.28.0) in R⁶⁷. The peak-to-cell raw matrix was used as input, sequences of peaks were scanned and motifs from the JASPAR database⁶⁸ were matched against sequences in open peaks of cells. Significant motifs (FDR < 0.01) were included in GRN construction and heat-map visualization. For DAP analysis along the drug-resistance trajectory, data sparsity in single-cell ATAC and program memory requirements were addressed by binning the trajectory as follows: 1,000 cells on the leftmost (melanocytic) side of the trajectory were divided into ten bins, 300 cells on the rightmost (undifferentiated) side were divided into three bins and the remaining cells in the middle were divided into ten bins. For bins in the middle containing more than 100 cells, 100 cells were randomly sampled. ATAC raw peak counts were aggregated for each bin, and only peaks with non-zero counts in more than one third of the bins were retained. Subsequent steps followed the methodology described above. Motif enrichment in DAPs was performed using HOMER (v.5.1)⁶⁹.

Melanoma phenotype switch GRN inference

The GRN inference was conducted through the following steps. First, we determined the initial and terminal states of each phenotype switch, which defined the directionality of the temporal constraints and the cell order window to use. Cells in the DMSO condition initially exhibited a neural crest-like phenotype and transdifferentiated into either melanocytic (DMSO neural-crest-like-to-melanocytic trajectory) or undifferentiated (DMSO neural-crest-like-to-undifferentiated-1 trajectory and DMSO undifferentiated-1-to-undifferentiated-2 trajectory) phenotypes. The bin containing the median cell order value of DMSO NTC cells was set as the root for both the DMSO neural-crest-like-to-melanocytic trajectory and the DMSO neural-crest-like-to-undifferentiated-1 trajectory. Additionally, the bin in which the threshold value between gene clusters 7 and 8 (Fig. 2f) was reached was set as the root for the DMSO undifferentiated-1-to-undifferentiated-2 trajectory. For cells under Vem treatment, we observed a strong trend of transdifferentiation towards the melanocytic side of the trajectory, particularly from the normal wild-type neural crest-like phenotype to the inactivated neural crest-like phenotype (Vem-sensitive trajectory). Conversely, drug-resistant cells transitioned from the inactivated neural crest-like phenotype to the undifferentiated phenotype (Vem-resistant trajectory). The root of the Vem-sensitive trajectory was set at the bin containing the median cell order value of DMSO NTC cells, whereas the root of the Vem-resistant trajectory was set at the bin immediately following the one containing the median cell order value of Vem NTC cells.

We performed k-means clustering (k = 10) on all significant differential expression steady-state and nascent z-scored gene expressions along the trajectory. Genes were filtered by correlating steady-state gene expression with nascent gene expression at the cluster level along the trajectory. Steady-state and nascent gene clusters with correlations of 0.6 or higher were matched, and genes present in both counterparts of matched clusters, with nascent gene expression peaking either before or simultaneously with steady-state gene expression in the direction of transdifferentiation, were retained as GRN target gene candidates. We then determined the association between TF motifs and genes. If TF motifs identified by motifmatchr were found in peaks significantly associated with the nascent expression of a gene, a TF–gene linkage was considered. Significant z-scored TF motif counts along the trajectory were clustered using k-means (k = 10), and only TF motif counts that showed consistent trends with their steady-state gene expression (correlation of 0.4 or higher and manual curation) were retained. Motif clusters and nascent gene expression clusters were matched in the same manner as described above. TF–gene linkages were retained only if they met the following criteria: (1) the associated peaks of a gene contained at least one motif of the TF from its matched counterpart motif cluster; and (2) the peak of motif opening along the trajectory occurred before or at the same time as the peak of nascent gene expression.

For each transdifferentiation trajectory segment (such as DMSO neural-crest-like-to-undifferentiated-1 trajectory, DMSO undifferentiated-1-to-undifferentiated-2 trajectory, DMSO neural-crest-like-to-melanocytic trajectory, Vem-sensitive trajectory and Vem-resistant trajectory), only clusters from all modalities (steady-state, nascent transcriptome and motif) that exhibit peak values within the segment were considered for segment-specific GRN construction.

Multimodal regulon scoring analysis

A regulon is defined as a group of target genes co-regulated by an upstream TF. We propose an integrated score (regulon score) to quantify the overall transcriptional activity of this gene group in a group of cells that received the same sgRNA:

$$\frac{1}{{mn}}{T}_{i}\mathop{\sum }\limits_{j=1}^{n}{G}_{j}\mathop{\sum }\limits_{k=1}^{m}{P}_{{jk}}$$

where $T\in ({T}_{1},{T}_{2},\ldots ,{T}_{l})$ is the log-transformed, normalized pseudobulk steady-state expression level of the regulator TF $l$ in a perturbed population; $G\in ({G}_{1},{G}_{2},\ldots ,{G}_{n})$ is the log-transformed, normalized pseudobulk nascent expression level of a target gene regulated by the TF in a perturbed population; ${P}_{j}\in ({P}_{j1},{P}_{j2},\ldots ,{P}_{jm})$ is the log-transformed, normalized ATAC sequencing counts at a peak associated with the target gene $j$, which contains the motif of TF $i$ in a perturbed population.

To identify the pattern of inter-regulon co-regulation in Vem resistance, we selected perturbations in the Vem condition that exhibited a significant shift towards the undifferentiated end of the trajectory with consistent growth phenotypes, as well as perturbations in the Vem condition that showed no significant shift in the trajectory. The fold change in regulon scores between perturbations and NTC was calculated, and Pearson correlations of these fold changes between regulons across the selected perturbations were computed. The distances between regulons were calculated, and hierarchical clustering was performed on the resulting correlation coefficient matrix. Clusters of regulons were then defined using the cutree() function.

To assess the statistical significance of the regulon score fold changes for each perturbation compared with NTC in the Vem condition, we implemented a permutation strategy. For each regulon in each perturbation, the order of $G$ was randomly permuted 500 times, and the permuted regulon scores were divided by the real regulon score of the NTC. This process generated a fold-change background distribution for each regulon in each perturbation. Two-sided empirical P values were calculated for each regulon in each perturbation and subjected to FDR correction. Regulon scores with FDR < 0.1 were considered significant.

Cell order–gene signature correlation analysis and YAP ChIP–seq analysis

To identify potential contributors to the cell-state shift in the trajectory, oncogenic signatures from MSigDB⁷⁰ in Vem-treated single cells were scored using the AddModuleScore() function in Seurat. Pearson correlations between cell order values and gene signature scores were calculated. Gene signatures with significant correlations (FDR < 0.01) were further tested with 1,000 permutations, followed by FDR correction, to confirm that the observed correlations were not attributable to random chance. YAP indirect binding peaks were obtained from a previous report⁴³. The Pscan-ChIP tool (v.1.3)⁷¹ was used to analyse local motif enrichment within these peaks, identifying potential direct downstream signal transmitters of YAP.

TF motif deviation scoring analysis

The chromVAR (v.1.28.0) package⁷² in R was used to compute TF motif deviation scores for individual cells. Motif scores for each cell were calculated using peaks that were open in at least ten cells. GC-matched backgrounds were then added using the addGCBias() function, and the computeDeviations() function was executed to obtain motif deviation scores. For each treatment condition, the deviation score distribution of each motif was compared with that of NTC cells. Simultaneously, the steady-state DEG results between perturbed populations and NTC cells were intersected with the motif score tests. Only TFs that were significant in the DEG analysis and exhibited a minimum level of gene-expression changes were retained. Two-sided P values for these motifs were further corrected for FDR. Significant motifs were defined as those with an FDR < 0.1, with enrichment or depletion determined on the basis of the median z-score differences between perturbed cells and NTC cells.

Transcriptome and TF deviation score-based clustering

To cluster perturbed populations, the top 200 whole-transcriptome DEGs with the highest fold changes were extracted and combined into a unified gene set. Pairwise Pearson correlation coefficients between perturbations were calculated on the basis of the fold changes of these populations compared with NTC cells. Perturbation clustering on the basis of motifs was performed using the median deviation score differences between the perturbed populations and NTC cells. Deviation score differences were set to 0 if a motif was not present in the original analysis output.

Mediator complex perturbation associated gene program analyses

We established the following criteria to identify gene signatures in oncogenic signatures that might account for heterogeneous phenotypic responses to genetic perturbations in the Mediator complex and Vem treatment: (1) under the DMSO condition, compared with NTC cells, the scores of gene signatures in both MED15 and MED24 showed no significant changes (on the basis of Wilcoxon tests with FDR corrections); and (2) under the Vem condition, compared with NTC cells, the scores of gene signatures in both MED15 and MED24 showed significant changes (on the basis of Wilcoxon tests with FDR corrections).

Published RNA-seq and ChIP–seq analyses

Genomic SOX10 binding profiles and paired bulk gene expression were analysed in a previous study⁷³ using another melanoma cell line harbouring the BRAF^V600E mutation (501Mel cell line) treated with a different MAPK signalling inhibitor (AZD6244). Validation analysis on SOX10 loss-of-activity effects was conducted using raw count matrices and processed ATAC genome tracks originally from a previous study⁴⁸.

For bulk RNA-seq processing, adapter sequences were removed in paired-end mode using Trim_Galore (v.0.6.7). Reads were mapped to the hg38 reference genome (GRCh38.p13; GENCODE) using STAR (v.2.7.9a). Reads with mapping scores below 10 were filtered out using SAMtools (v.1.13). PCR duplicates were removed with Picard (v.2.27.4), and gene counts were generated using featureCounts (v.2.0.1). For ChIP–seq processing, adapter sequences in FASTQ files were trimmed using Cutadapt (v.3.4). Reads were aligned to the hg38 reference genome using Bowtie 2. PCR duplicates were removed using Picard (v.2.27.4), and peaks were called using MACS2 (v.2.2.9.1) with a minimum q-value threshold of 0.01.

ChIP–seq peaks called from replicates were combined and resized to 500 bp, extending from the centre. A total of 2,003 and 7,159 peaks were identified for the DMSO and AZD6244 conditions, respectively, with 1,878 overlapping peaks. Significantly upregulated DEGs with an FDR < 0.05 were identified using the DESeq2 (v.1.50.2) package⁷⁴ in R. To identify potent MAPK/ERK inhibitor-responsive SOX10 targets, the nearest genes within 10 kb of 5,281 AZD6244-specific SOX10 binding peaks were identified using the closest command from bedtools (v.2.30.0)⁷⁵ and intersected with the significantly upregulated DEGs. Genes from this intersection that also met the following criteria were considered SOX10 direct targets responsive to Vem treatment: (1) the maximum steady-state expression in CPM in either DMSO NTC cells or Vem NTC cells was 20 or higher; (2) the gene was significantly upregulated in NTC cells upon Vem treatment; and (3) the fold change was 1.25 or higher. Ratios of these genes between the Vem-to-DMSO fold change in perturbed cells and the Vem-to-DMSO fold change in NTC cells were calculated to assess the strength of SOX10-mediated gene activation upon Vem treatment.

Likewise, to identify regulatory interactions between YAP and the Mediator complex, we reprocessed ChIP–seq data from different conditions originally described in a previous study⁴⁷. FASTQ files were preprocessed for peak calling as described above. MED1 ChIP–seq peaks present only under the control condition, absent under the siYAP/TAZ condition and that overlapped with YAP ChIP–seq peaks were defined as YAP–Mediator complex co-regulatory genomic regions. GREAT⁷⁶ was used to link regulatory regions to putative target genes. Only target genes that were significantly upregulated in LATS2-knockdown and NF2-knockdown cells in our study were retained as YAP–Mediator complex co-regulated genes. Validation analyses and visualization of Mediator binding signals for representative genes were performed using normalized bigWig data originally from a previous study⁴⁶.

Validation on patient tumour samples was conducted using RNA-seq data from MAPKi-sensitive and MAPKi-resistant tumours²⁸. DESeq2 was used for initial DEG identification, with tumour drug sensitivity as the primary covariate and patient identity as a confounding covariate. The AddModuleScore() function in Seurat was used to score phenotype programs and TF modules. For TF regulons with large target sets, the top 500 genes with the highest average expression across tumour samples were selected for scoring. Penalized linear regression was performed using the glmnet (v.4.1-10) package in R⁷⁷ with elastic net regularization (α = 0.3). The pROC (v.1.19.0.1) package in R⁷⁸ was used to evaluate area under the curve. DESeq2 was also used to process RNA-seq data from A375 cells overexpressing multiple YAP mutants⁷⁹.

HCR staining on intracellular RNA

HCR was performed to quantify intracellular AXL expression levels. Probe pairs targeting AXL messenger RNA were designed in-house according to a previous study⁸⁰, and the following steps were carried out on the basis of a published protocol⁸¹. In brief, cells were fixed in 4% formaldehyde at room temperature for 1 h, followed by ethanol permeabilization at −20 °C overnight. Hybridization was performed at 37 °C overnight using 40 nM per probe. Hairpins conjugated with an Alexa 647 fluorophore (Molecular Instruments; HCR Gold Amplifier) were used for signal amplification overnight at room temperature. Flow cytometry analysis was performed to record intracellular signals.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Mapping convergent regulators of melanoma drug resistance by PerturbFate

Cell culture

Cell strain construction

Plasmid and lentivirus construction

Gene selection and sgRNA evaluation

CRISPR screen experimental procedure

Sequencing data processing

Measurement of growth phenotypes in bulk CRISPR screens

Identification of sgRNA singlets and examination of knockdown efficiency

Differential gene expression and differentially accessible peak analyses

UMAP embedding on pseudobulk perturbations

ATAC peaks–nascent gene-expression linkage identification

Cell trajectory and nascent-transcriptome-based RNA velocity inference

Significance examination of perturbed cells in the trajectory

Identification of DEGs, differentially accessible motifs and DAPs along the cell trajectory

Melanoma phenotype switch GRN inference

Multimodal regulon scoring analysis

Cell order–gene signature correlation analysis and YAP ChIP–seq analysis

TF motif deviation scoring analysis

Transcriptome and TF deviation score-based clustering

Mediator complex perturbation associated gene program analyses

Published RNA-seq and ChIP–seq analyses

HCR staining on intracellular RNA

Reporting summary

The heart has a ‘little brain’ — and it protects against stress

Structural basis of the RNA-editing cascade in trypanosome mitochondria

Could small-team science be dying out?

Most Popular

Menlo Ventures’ Matt Murphy explains why Anthropic is winning (and it’s not the model)

The heart has a ‘little brain’ — and it protects against stress

Top Orthotics From Dr. Scholl’s, Vionic

Are Your Prices Fair? There’s a Simple Formula to Find Out

Recent Comments

ABOUT US

POPULAR POSTS

Menlo Ventures’ Matt Murphy explains why Anthropic is winning (and it’s not the model)

The heart has a ‘little brain’ — and it protects against stress

Top Orthotics From Dr. Scholl’s, Vionic

POPULAR CATEGORY