Single-cell transcriptomic and chromatin dynamics of the human brain in PTSD

Human tissue donors

Human post-mortem brain tissue samples were obtained from the National Center for PTSD Brain Bank (with consent of the next of kin) and the University of Pittsburgh Tissue Donation Program. Individuals were a mix of European and African-American descent. Male and female individuals were group matched for age and post-mortem interval (PMI). Sociodemographic and clinical details are listed in Extended Data Fig. 1a and include comorbidities such as the use of tobacco and substances of abuse. Inclusion criteria for PTSD, MDD and control cases were as follows: PMI < 48 h, age range of more than 18 to less than 85 years. A total of 111 individuals (36 PTSD: 19 female and 17 male individuals; 36 MDD: 18 female and 18 male individuals; and 39 healthy controls: 18 female and 21 male individuals) were used in this study. The brain tissue was fresh-frozen and samples from the DLPFC (Broadman areas 9/46)⁵³ were collected (approximately 25 mg) from each post-mortem sample.

Psychiatric history and demographic information were obtained by psychological autopsies performed post-mortem, as well as a review of medical records and toxicology reports. Trained clinicians conducted structured interviews with informants (usually the next of kin) with knowledge of the deceased individuals. To avoid systematic biases, PTSD, MDD and control cases from each source were characterized by the same psychological methods. Consensus DSM-IV diagnoses for each participant were made by trained clinicians who did not conduct the psychological autopsies. Of the 36 individuals in the PTSD cohort, 75% (27 cases) were also comorbid for MDD.

Isolation of nuclei from brain cells for RNA, ATAC and multiome assays

Regions of interest were dissected on cryotome (leaflets of approximately 100–300 µm) from frozen DLPFC and stored at −80 °C. Cell nuclei isolation from brain sections were treated similarly to already established protocols^18,54,55 with some modifications needed to have nuclei suitable for two separate assays: gene expression and ATAC. To avoid experimental bias, nucleus isolation was done by the same person blinded to the metadata for the particular sample. All reagents were molecular biology grade and sourced from Sigma unless stated otherwise. Small amounts of tissue (10–20 mg) were added into 1 ml of ice-cold lysis buffer (‘buffer A’ is 250 mM sucrose, 25 mM KCl, 5 mM MgCl₂, 10 mM Tricine-KOH (pH 7.8), protease inhibitors without EDTA (Roche), RNAse inhibitor (80 U ml⁻¹; Roche), 1 mM dithiothreitol, 1% BSA (m/v; Gemini Bio-Products), 0.3% NP-40 (v/v), 0.15 mM spermine, 0.5 mM spermidine and water; dithiothreitol, RNAse Protector, protease inhibitors, spermine, spermidine and NP-40 were added immediately before use. The suspension was transferred to a 2-ml Dounce tissue homogenizer (Kimble) and lysed with constant pressure and without introduction of air with pestle A (2 × 5) and pestle B (4 × 5) in cycles of 5. The homogenate was strained through a pre-wetted 40-μm tube top cell strainer (Corning). All subsequent centrifugation was performed in a refrigerated, bench-top centrifuge with swing-out rotor (Eppendorf). Lysates were centrifuged at 1,000g for 10 min at 4 °C, pellets were saved and resuspended in 0.4 ml lysis buffer buffer A. The final 0.4 ml of solution was mixed with 0.4 ml (1:1) of Optiprep solution (buffer B was iodixanol 50% (v/v), 25 mM KCl, 5 mM MgCl₂, 20 mM Tricine-KOH (pH 7.8), protease inhibitors without EDTA, RNAse inhibitor (80 U ml⁻¹), 1 mM dithiothreitol and water. The suspension (25% iodixanol final) was mixed 10× head over tail. The 2-ml tube was filled with 0.6 ml of 40% iodixanol cushion (appropriate mix of buffer A and buffer B), then overlayed with 0.6 ml of 30% iodixanol (appropriate mix of buffer A and buffer B), and sample suspension was overlayed at the top. The tubes were then centrifuged at 3,000g for 30 min at 4 °C without brakes. After centrifugation ended, the interphase 30–40% ring was collected. Out of 300 µl collected, 100 µl was used for gene expression studies and 200 µl was used for ATAC studies. One thousand microlitre ‘RNA wash buffer’ was added to gene expression samples, and samples were mixed and left on ice for 10 min. RNA wash buffer was: RNase Protector (80 U ml⁻¹), 1 mM dithiothreitol, 25 mM KCl, 5 mM MgCl₂, 20 mM Tris-HCl (pH 7.5), 1% (m/v) BSA, 0.1% (v/v) Tween-20 and DPBS (14190, Gibco). Five hundred microlitres of ATAC lysis buffer was added to the ATAC sample, mixed well and left on ice for 5 min. The ATAC lysis buffer was: 10 mM Tris-HCl (pH 7.4), 10 mM NaCl, 3 mM MgCl₂, 0.01% (v/v) Tween-20, 0.01% (v/v) NP-40, 0.001% (v/v) digitonin, 1% (m/v) BSA and nuclease-free water. After 5 min, 500 µl of ATAC wash buffer was added to ATAC sample, mixed well and left on ice for 5 min. ATAC wash buffer was: 10 mM Tris-HCl (pH 7.4), 10 mM NaCl, 3 mM MgCl₂, 0.1% (v/v) Tween-20, 1% (m/v) BSA and nuclease-free water. Both samples were centrifuged at 1,000g for 10 min at 4 °C, supernatants were carefully and completely removed. Gene expression sample was counted on a haemocytometer, resuspended in ‘RNA run buffer’ (RNase Protector (80 U ml⁻¹), 1 mM dithiothreitol, 25 mM KCl, 5 mM MgCl₂, 20 mM Tris-HCl (pH 7.5), 250 mM sucrose and water) and concentration adjusted to 1 M per millilitre. The ATAC sample was counted on a haemocytometer and resuspended to 3.3 M per millilitre in ATAC run buffer (1× Nuclei Buffer (PN2000153, 10X Genomics) and water). The samples were brought to the Yale Center for Genome Analysis (YCGA) within 15 min and processed by following protocols for 10X Genomics Chromium Single Cell ATAC (CG000168 Rev B, 10X Genomics), and 10X Genomics Chromium Single Cell 3′ Reagent Kits v3 (CG000183 Rev C, 10X Genomics), respectively, for targeting 10,000 nuclei. For some assays that were done with 10X Genomics Chromium Next GEM Single Cell Multiome ATAC + Gene Expression (CG000338 Rev A, 10X Genomics) the 300 µl sample post-3,000g centrifugation was used in its entirety, the steps for ATAC were followed, and the buffers for ATAC had 80 U ml⁻¹ RNAse Protector. After the last centrifugation, the samples were resuspended in ‘diluted nuclei buffer’ (1× Nuclei Buffer (PN-2000207), 1 mM dithiothreitol, 1 U ml⁻¹ RNase Protector and water) counted on a haemocytometer at the concentration of 3.3 M ml⁻¹ and submitted to the YCGA where the protocol was followed for targeting 10,000 nuclei. Libraries were sequenced with paired-end 150-bp reads on an Illumina NovaSeq 6000 to a target depth of up to 500 million read pairs per sample for gene expression (RNA), and up to 500 million reads per sample for ATAC.

Xenium in situ transcriptomics

Gene panel design

The Xenium In Situ platform uses targeted panels to detect expression genes. Genes (n = 267) are included on the Xenium Human Brain Panel, and an additional 99 genes were selected and curated primarily based on our single-cell atlas of DEGs for the PTSD brain (366 genes total). The complete list of genes in our Xenium panel can be found in Supplementary Table 23.

Xenium sample preparation

Fresh-frozen tissue samples from control (n = 10), PTSD (n = 4) and MDD (n = 4) participants were obtained from the National PTSD Brain Bank. Frozen blocks were dissected from the DLPFC from the superior frontal gyrus approximately 1 cm anterior to the genu of the corpus callosum across all participants. Blocks were sectioned at 10 mm thickness and placed on a 10X Genomics Xenium slide as per protocol (10X Genomics protocol CG000580 Rev B). The Xenium slide with the thaw-mounted tissue was fixed in paraformaldehyde solution and then permeabilized in 70% methanol. Tissue slides were washed and then inserted into 10X Genomics Xenium slide cassettes and then processed through hybridization, ligation and amplification protocols (10X Genomics protocol CG000582 Rev C). In brief, the pre-designed Xenium Human Brain Gene Expression panel with 266 genes along with 99 custom probes were added to the tissue. Each circularizable DNA probe contained two regions that hybridized to target RNA and a third region that encoded a gene-specific barcode. Probes were loaded at 10 nM at 50 °C for overnight hybridization. For ligation, the two ends of the probes bound the target RNA and were ligated to generate a circular DNA probe at 37 °C for 2 h. Following ligation, the circularized probe was amplified for 2 h at 30 °C, producing multiple copies of the gene-specific barcode for each target.

Xenium imaging

Prepared tissue slides were then loaded for imaging on the Xenium Analyzer for in situ analysis. Fluorescently labelled oligos were bound to the amplified DNA probes. Cyclical rounds of fluorescent probe hybridization, imaging and removal generated optical patterns specific for each barcode, which were converted into a gene identity. Identified transcripts were then visualized using Xenium Explorer software before bioinformatics analysis.

Animals

All animal care and procedures were performed according to the ethical guidelines of the Institutional Animal Care and Use Committee at Yale University School of Medicine and Yale Animal Resources Center. Adult 2-month-old male and female C57BL/6 SST–Cre mice (n = 31 mice in total) were used for this study. Until stress, all mice were group housed and maintained in standard environmental conditions (23 °C; 12 h–12 h light–dark cycle) with ad libitum food and water.

Viral injections

Adult mice were anaesthetized using 1–2% isoflurane, placed in a stereotaxic apparatus, and maintained at 37 °C during surgery. Each mouse received a unilateral microinjection of AAV-Ef1a-DIOhChR2(123T/T159C)-eyfp (500 nl; Addgene) using a Nanoject III (Drummond) at 50 nl min⁻¹ with a 5–10-min diffusion period before removing the glass capillary. The coordinates used for mPFC were anteroposterior (AP) = +1.5 mm, mediolateral = +0.2 mm, and dorsoventral = +2.4 mm. Two weeks were allowed for recovery and viral expression after injections.

SPS

Mice for SPS were handled for 3 days before the stress protocol. On the day of SPS, mice (n = 16) underwent a 2-h restraint in a restrainer followed by forced swim for 20 min in 23–25 °C water. After 15 min of recuperation, mice were exposed to diethyl ether vapour until loss of consciousness. Mice were then single housed until euthanized for slice electrophysiology (7–14 days after SPS). Control mice (n = 15) were group housed throughout the experiment. All mice were weighed daily after SPS. Animals were randomly assigned to control or SPS groups. Analysis of behavioural data was performed blinded to condition. Behavioural data have been provided in Source Data File 1.

Slice electrophysiology

Brain slices containing the mPFC were prepared from control and SPS-treated male and female mice as previously described^56,57. After decapitation, brains were removed rapidly and placed in ice-cold (approximately 4 °C) artificial cerebrospinal fluid (ACSF) in which sucrose (252 mM) was substituted for NaCl (sucrose-ACSF) to prevent cell swelling. Coronal slices (300 μm) containing the mPFC were cut in sucrose-ACSF with an oscillating-blade vibratome (Leica VT1000S). Slices were then transferred to the fixed stage of an Olympus BX50WI microscope for whole-cell recordings. The chamber was continuously perfused with normal ACSF at a flow rate of 2–3 ml min⁻¹, with the temperature maintained at 33 ± 0.5 °C. The standard ACSF (approximately pH 7.35) was equilibrated with 95% O₂/5% CO₂ and contained: 128 mM NaCl, 3 mM KCl, 2 mM CaCl₂, 2 mM MgSO₄, 24 mM NaHCO₃, 1.25 mM NaH₂PO₄ and 10 mM d-glucose. Pyramidal neurons were visualized using a microscope (×40 IR lens) with infrared differential interference contrast (IR/DIC). Low-resistance patch pipettes (3–5 MΩ) were pulled from patch-clamp glass (Warner Instrument) using a horizontal micropipette puller (P-1000, Sutter Instrument). Patch pipettes (3–5 MΩ) were filled with a solution contained the following: 125 mM Cs gluconate, 10 mM HEPES, 5 mM BAPTACs, 10 mM Na2-phosphocreatine, 2.38 mM CaCl₂, 4 mM Mg-ATP and 0.3 mM Na2GTP pH 7.33. Whole-cell recordings were performed using an Axoclamp-2B amplifier (Axon Instruments). The output signal was low-pass-filtered at 3 kHz, amplified 100× and digitized at 15 kHZ and acquired by using Clampex 10.5/Digidata 1550A software.

Tonic GABA current was recorded in voltage-clamp configuration by holding the membrane potential at 0 mV. The difference between holding currents in the presence of the GABA_B receptor agonist baclofen (10 μΜ; Tocris Bioscience) and after the application of the GABA_B receptor antagonist CGP 55845 (20 µM; Tocris Bioscience) was measured. Cells were held for 8 min before recording baseline for 2 min, followed by 5–8 min of recording in each drug application. One minute of recorded traces at plateau of each condition was analysed. Holding current was measured by using a Gaussian fit of the histogram of the 1-min trace.

For experiments measuring eIPSCs by stimulating SST interneurons optogenetically, responses in pyramidal cells were recorded when blue light (470 nm) was applied using CoolLED pE-800 through the ×60 objective. The intensity was set to obtain optimal single-peak responses (approximately 300 pA at baseline, 0.1-ms stimulation). Ten traces were recorded and averaged for light responses. Average eIPSC peak amplitudes before and after bath applying MRK 016 (1 μΜ; MedChemExpress) were measured to calculate the percentage change in IPSC amplitude. Electrophysiology data have been provided in Source Data File 1.

iPS cell-derived glutamatergic neuron transfections

Control human induced pluripotent stem (iPS) cell-derived NPCs (neural progenitor cells) from two lines: NSB553-S1-1, (male, European ancestry) and NSB2607-1-4 (male, European ancestry, source: National Institute of Mental Health) were used. Both lines were mycoplasma negative at the time of experimentation. NPCs were cultured in hNPC medium (DMEM/F12 (10565, Life Technologies), 1× N2 (17502-048, Life Technologies), 1× B27-RA (12587-010, Life Technologies), 1× antibiotic–antimycotic and 20 ng ml⁻¹ FGF2 (Life Technologies)) on Geltrex (A1413301, Thermo Fisher). hiPS cell–NPCs at full confluence (1–1.5 × 10⁷ cells per well of a six-well plate) were dissociated with Accutase (Innovative Cell Technologies) for 5 min, spun down (5 min at 1,000g), resuspended and seeded onto Matrigel-coated plates at 3–5 × 10⁶ cells per well. Medium was replaced every 2–3 days for up to 7 days until the next split.

At day −1, confluent hiPS cell–NPCs were transduced with rtTA (Addgene 20342) and NGN2 (Addgene 99378) lentiviruses. At day 0, 1 µg ml⁻¹ dox (doxycycline) was added to induce NGN2 expression. At day 1, transduced hiPS cell–NPCs were treated with antibiotics to select for lentiviral integration (1 mg ml⁻¹ G-418). At day 3, NPC medium was switched to neuronal medium (Brainphys; 05790, Stemcell Technologies), 1× N2 (17502-048, Life Technologies), 1× B27-RA (12587-010, Life Technologies), 1 µg ml⁻¹ natural mouse laminin (Life Technologies), 20 ng ml⁻¹ BDNF (450-02, Peprotech), 20 ng ml⁻¹ GDNF (450-10, Peprotech), 500 µg ml⁻¹ dibutyryl cyclic-AMP (D0627, Sigma) and 200 nM l-ascorbic acid (A0278, Sigma) including 1 µg ml⁻¹ dox. Of the medium, 50% was replaced with fresh neuronal medium once every second day.

On day 5, young hiPS cell–NPC NGN2 neurons were replated onto Geltrex-coated plates. Cells were dissociated with Accutase (Innovative Cell Technologies) for 5–10 min, washed with DMEM, gently resuspended, counted and centrifuged at 1,000g for 5 min. The pellet was resuspended in neuron medium, and cells were seeded at a density of 1.2 × 10⁶ per well of a 12-well plate. At day 11, isogenic glutamatergic neurons were treated with 200 nM Ara-C (C6645, Sigma) to reduce the proliferation of non-neuronal cells in the culture, followed by half medium changes. At day 17, Ara-C (cytosine β-d-arabinofuranoside hydrochloride) was completely withdrawn by a full medium change while adding medium containing four short hairpin RNA (shRNA) vectors targeting a transcription factor (Sigma-Aldrich; targeting SMARCC1: TRCN0000015632, TRCN0000015631, TRCN0000015630 and TRCN0000015628; targeting EGR2: TRCN0000013841, TRCN0000013840, TRCN0000013839 and TRCN0000013838; total multiplicity of infection = 1.0) or a multiplicity of infection-matched scramble shRNA control vector SHC016 (Sigma-Aldrich). Medium was switched to non-viral medium 4 h post-infection. At day 19, transduced isogenic glutamatergic neurons were treated with corresponding antibiotics to the shRNA lentiviruses (1 μg ml⁻¹ puromycin) followed by half medium changes until neurons were harvested in TRIzol reagent (Invitrogen) at day 21 and bulk sequenced at the Yale Center for Genome Analysis for RNA integrity number and concentration, which were assessed using a Bioanalyzer (Agilent). Libraries were constructed using the SMARTer Stranded RNA-seq Kit (Takara Bio) preceded by rRNA depletion using 1 µg of total RNA. Samples were barcoded for multiplexing and sequenced at 75-bp paired-end on an Illumina HiSeq4000. Samples were pooled eight per lane and sequenced at a depth of 50 million reads.

snRNA-seq processing and analysis pipeline

The count matrix was generated by aligning reads to the hg38 genome using 10X Genomics CellRanger⁵⁸ (v6.1.1). For ambient RNA removal, to further eliminate technical artefacts and background noise in each sample, we used the remove-background command from CellBender (v0.2.2)⁵⁹, which outputs an HDF5 file in which the empty droplets were filtered out. For doublet detection, doublets were identified using a combination of two computational methods — Scrublet⁶⁰ (v0.2.3) and DoubletDetection⁶¹ (v4.2) — and removed from each sample. For sample aggregation, we clustered each sample in Pegasus⁶² (v1.5.0), a Python tool for analysing transcriptomes of single cells, then manually inspected all uniform manifold approximation and projection (UMAP) embeddings and removed six low-quality samples that displayed low heterogeneity of cell types, resulting in a total of 105 snRNA-seq samples. After aggregating these samples into a single data object, we filtered cells based on the following criteria: at most 10% mitochondrial genes, at least 200 genes and at least 500 unique molecular identifiers. Mitochondrial, sex and ribosomal genes were excluded, and only the robust genes were included in the final gene set. The final snRNA-seq data object had 935,371 cells and 27,982 genes. For clustering and annotation, after cell and gene filtration, the data were log-normalized. Next, the top 2,000 highly variable features were found, and principal component analysis was applied using the top 50 components. Batch correction was performed with Harmony. The top 50 components were then used to build a k-nearest-neighbours graph with k = 100 neighbours, and Leiden clustering was used to identify cell clusters. After the first round of clustering, marker genes were used to inspect and remove clusters that had mixed or unmatched marker gene expression, and another round of clustering was performed. Cell-type annotation was adapted using markers from Ma et al.¹⁸ with three levels of hierarchy: (1) 7 cell types, (2) 14 subtypes, and (3) 61 transcriptomic subtypes. The seven cell types include: EXN, IN, OLG, OPC, END, AST and MG. The 14 subtypes include: CUX2, RORB, FEZF2, OPRK1, LAMP5, KCNG1, VIP, SST, PVALB, OLG, OPC, END, AST and MG. Subclustering was performed on each of the 14 subtypes using markers from Ma et al.¹⁸ to identify the gene label for each subcluster, resulting in 61 distinct transcriptomic subtypes at the finest clustering level (Extended Data Fig. 2d,e).

snATAC-seq processing and analysis pipeline

The peak count matrix was generated by aligning reads to the hg38 genome using 10X Genomics CellRanger ATAC¹⁴ (v2.0.0). For quality control and filtering, using the outputs of CellRanger, we created Arrow files for each sample with cells filtered based on the quality control parameters filterTSS=4 and filterFrags=1000 in ArchR⁶³ (v1.0.2), an R package for analysing scATAC-seq data. ArchR considers three characteristics: (1) the number of unique nuclear fragments, (2) the signal-to-background ratio, and (3) the fragment size distribution. Owing to nucleosomal periodicity, we expected to see depletion of fragments that are the length of DNA wrapped around a nucleosome, around 147 bp. We used the addDoubletScores function to infer potential doublets that can confound downstream results. For sample aggregation, after manually inspecting the quality control plots, we removed several low-quality samples, the matching ATAC samples of the six low-quality RNA samples from above, and samples that did not have a matching RNA sample, which resulted in a total of 94 samples. We integrated these samples into an ArchR project, on which all downstream analyses for snATAC-seq were performed. The filterDoublets function was used to filter doublets for downstream analyses. For clustering and annotation, the addIterativeLSI function was used to implement iterative LSI dimensionality reduction. Then, addClusters, which uses Seurat’s graph clustering as the default clustering method, was used to call clusters in the reduced dimension subspace. We visualized the embedding using addUMAP and plotEmbedding. Then, we annotated the clusters using marker genes from Lake et al.⁶⁴. After inspecting the marker genes, we manually cleaned the UMAP and repeated this process twice to produce a final embedding of 473,321 cells annotated into seven cell types: EXN, IN, OLG, OPC, END, AST and MG.

snMultiome processing and analysis pipeline

For count matrix generation, we ran CellRanger ARC (v2.0.2) with the hg38 genome as the reference to process the initial reads from the raw FASTQ files. For quality control and filtering, we used Signac⁶⁵ (v1.11.0), a comprehensive R package for the analysis of single-cell chromatin data, to preprocess each snMultiome sample. Signac creates a Seurat object containing both RNA and ATAC assays. For quality control, we filtered cells using the following parameters: nCount_ATAC > 1,000, nCount_RNA > 200 and TSS.enrichment > 2. Then, we filtered doublets in the snRNA-seq data using DoubletDetection. For sample aggregation, we clustered each sample in Signac and aggregated a total of 25 snMultiome samples in the final data object after manually inspecting each UMAP embedding. For clustering and annotation, we normalized the gene expression data using SCTransform and reduced the dimensionality using principal component analysis. For chromatin accessibility data, we performed latent semantic indexing with functions FindTopFeatures, RunTFIDF and RunSVD to reduce sparsity, which is common in snATAC-seq data. We then computed a joint neighbour graph that represents both gene expression and chromatin accessibility measurements using the weighted nearest-neighbour methods in Seurat v4 using the function FindMultiModalNeighbors. Finally, we plotted the joint UMAP embedding with RunUMAP and annotated the 119,431 cells into 7 cell types and 14 subtypes (same as in snRNA-seq) using canonical marker genes. We converted the Seurat object into an ArchR project to call peaks on the 7 cell types and 14 subtypes.

Xenium processing and analysis pipeline

For segmentation, we analysed the Xenium data by both cell and nuclei. For cellular boundaries, we used the default settings from the output of Xenium Analyzer, which is based on a 15-μm outward expansion of DAPI-stained nuclei. For nuclear boundaries, we performed resegmentation using the resegment command from 10X Genomics Xenium Ranger (v2.0) with parameters –expansion-distance=0 and –resegment-nuclei=true. For quality control and filtering, we used Squidpy³³ (v1.6.1), a tool for analyzing spatial single-cell data, to create a UMAP embedding for each sample. For sample aggregation, after inspecting the individual sample UMAP embeddings and quality control metrics, such as the total and unique number of transcripts per cell or nucleus, we aggregated a total of 18 Xenium samples. For clustering and annotation, clustering was performed separately for scXenium and snXenium, resulting in 550,520 cells in the scXenium UMAP embedding and 523,314 nuclei in the snXenium UMAP embedding. Both had a fixed number of 366 genes. The clustered object was annotated into the seven canonical cell types, on which DEG analysis was performed. DEG analysis was performed using MAST for both scXenium and snXenium, and the results were compared with the snRNA-seq MAST results. scXenium and snXenium used thresholds |FC | > 1.1 and FDR < 0.01, and snRNA-seq used thresholds |FC | > 1.2 and FDR < 0.01. For visualization, SpatialData⁶⁶ (v0.1.2) was used to visualize the snXenium slides to highlight each cell-type-annotated nuclei and individual transcripts in their exact spatial coordinates. A single CON slide and a single PTSD slide were used to display differences in gene expression for a particular gene across the entire DLPFC, although the exact difference in gene expression was quantified using all CON samples versus all PTSD samples. This is shown as a barplot of log-normalized counts comparing the expression of a gene in CON versus PTSD across the seven cell types.

Differential abundance analysis

We calculated the differential abundance of cell types between conditions for each data modality using single-cell compositional data analysis (scCODA (v0.1.9))¹⁷. For RNA and Multiome, we ran the analysis across the 14 subtypes; for ATAC and Xenium, we ran the analysis across the 7 cell types. For each modality, we used the CompositionalAnalysis function with parameter reference_cell_type=‘automatic’, which automatically selects the cell type with the least amount of proportional change across conditions. We performed this analysis for PTSD versus CON, MDD versus CON, and PTSD versus MDD to compare across the three diagnostic cohorts. At FDR < 0.05, we found a significant decrease in OLGs for PTSD versus CON and PTSD versus MDD.

Differential gene expression analysis

snRNA-seq-based differential expression analysis was performed for each of the 14 subtypes using four tests: (1) MAST¹⁹ (v1.21.3) with covariate correction including covariates age death, sex, ancestry, PMI, RNA integrity number and library size; (2) Wilcoxon without covariate correction using the FindMarkers function from Seurat (v5.2.1)⁶⁷; (3) Nebula (v1.5.3)²⁰ for correcting sample-specific effects with covariate correction; and (4) DESeq2 (v1.46.0)²¹ with covariate correction. For MAST and Wilcoxon, we first log-transformed the CPM (counts per million)-normalized values of the raw count matrix. Only genes expressed in at least 5% of the cells in either condition, the robustly expressed genes, were included in the analysis. For Nebula, we only used genes expressed in at least 20% of the cells in either condition and removed samples that had less than 30 cells to reliably estimate the dispersion parameters. In the Nebula function, we used an offset term that combines n_counts and n_genes for each cell (0.8 × nCount + 0.2 × nFeature), the negative binomial gamma mixed model, kappa = 800 and cutoff_cell = 20. For DESeq2, we removed genes with either less than 20 reads per sample or a CPM value less than 0.5, and removed MDD samples with a very low number of read counts. We then utilized the default settings recommended for single-cell analysis such as test = ‘LRT’, minmu = 1 × 10⁻⁶ and minReplicatesForReplace = Inf. We set the fold-change threshold for significant DEGs at |FC | > 1.2; we also constrained the significant DEGs to be FDR < 0.01 for MAST and Wilcoxon, FDR < 0.1 for Nebula, and FDR < 0.05 for DESeq2 pseudobulk.

For snRNA-seq, we first performed DEG analysis comparing PTSD versus CON and MDD versus CON using all samples from the diagnostic cohort. We identified PTSD snDEGs (n = 1,184) and MDD snDEGs (n = 1,918) by overlapping MAST and Wilcoxon DEGs that were concurrent in direction across the seven cell types. We then compared these snRNA-seq DEGs to the bulk RNA-seq DEGs from Girgenti et al.⁹ and Chatzinakos et al.²⁶ studies. These DEG sets were used in downstream analyses, including cross-disorder comparisons between PTSD and MDD, comparisons with CLGs in the ATAC analysis, and the identification of downstream targets of cell-type-specific transcription factors. We also investigated sex differences by first performing a baseline comparison of CON (male) versus CON (female), followed by PTSD (male) versus CON (male), PTSD (female) versus CON (female), MDD (male) versus CON (male), and MDD (female) versus CON (female). To examine PTSD samples comorbid with MDD, we performed PTSD (+MDD) versus CON and PTSD (−MDD) versus CON analyses and compared the results to PTSD versus CON in a three-way set relationship. For snXenium and scXenium, we performed DEG analysis using MAST and compared the results to snRNA-seq MAST results to validate the DEGs.

PTSD versus MDD cross-disorder comparison

To investigate the convergent and divergent properties between PTSD and MDD, we first overlapped the PTSD snDEGs and MDD snDEGs, which yielded 502 DEGs specific to PTSD and 1,236 DEGs specific to MDD. Next, we used the Rank–Rank Hypergeometric Overlap (v1.46.0)³⁰ method to identify concordant and discordant gene signatures between PTSD and MDD in a threshold-free manner. We identified 12 significantly discordant (down in PTSD and up in MDD) genes across cell types. We separated the PTSD cohort into those with co-existing MDD (n = 27 samples) and those without MDD (n = 8 samples) and performed DEG analysis against CON to identify several genes specific to PTSD. We also identified PTSD-specific and MDD-specific DEGs from the snXenium and scXenium MAST DEG analyses.

Gene Ontology enrichment analysis

We performed all Gene Ontology enrichment analyses using the Enrichr R package (v3.4)⁶⁸. We conducted a total of five Gene Ontology Enrichr analyses for: PTSD snDEGs, PTSD cell-type-specific DEGs, 502 PTSD-specific DEGs, PTSD Nebula DEGs and PTSD DESeq2 pseudobulk DEGs. We used the Enrichr function to query the Gene Ontology Molecular Function 2023 and Gene Ontology Biological Process 2023 databases. We highlighted the top FDR significant terms and removed duplicates for all Gene Ontology barplots.

CCC analysis

We applied the standard workflow of CellChat⁶⁹ (v1.5.0), utilizing single-cell gene expression of ligands and receptors to infer a CCC network. We used the normalized count matrix and cell-type annotations from the snRNA-seq dataset. Among all possible ‘sender’ (a cell expressing ligand) and ‘receiver’ (a cell expressing receptor) pairs, we constructed a network of communication strength based on the expression levels of ligand and receptor genes across cell types. High communication strength exists if the sender cell type has a high expression of ligand genes and the receiver cell type has a high expression of receptor genes, and vice versa. The number of inferred ligand–receptor pairs depends on the method for calculating the average gene expression per cell group. We utilized the robust method in CellChat for mean calculation called ‘truncatedMean’, with the cut-off ‘trim’ parameter set to the default value of 0.05. Two separate analyses were done for each diagnostic condition, namely, PTSD and CON. We then identified the differential CCCs across conditions. Most importantly, we normalized the 3D matrices (sender cell types × receiver cell types × ligand–receptor interactions) in each of the CellChat objects to have the same sum to account for batch effects of snRNA-seq. We then summed across the ligand–receptor interactions to reduce the 3D matrices into a 2D CCC network. A differential communication network was then constructed through the subtraction of these two networks between PTSD and CON groups. Nodes were coloured based on the aggregation of sending communication strength, or receiving communication volume, for output or input, respectively. We further highlighted specific cellular communication differences by using a subset of the 3D matrix (that is, MG cell type × receiver cell types × ligand–receptor interactions), effectively reducing it into a 2D object.

Neuronal-specific CCC analysis

We applied the standard workflow of NeuronChat (v1.1.0)³⁶, which utilizes single-cell gene expression of ligands and receptors to infer a neural-specific CCC network. We used the normalized count matrix and cell-type annotations from the snRNA-seq dataset. The number of inferred ligand–receptor pairs depends on the method for calculating the average gene expression per cell group. We utilized the default method in NeuronChat of mean calculation called ‘trimean’, with the cut-off ‘trim’ parameter set to 0.1. NeuronChat places greater emphasis on the second quartile, followed by the third and fourth quartiles, of gene expression. Two separate analyses were done for each diagnostic condition, namely, PTSD and CON, before moving on to comparative differential analyses. For the differential comparison analyses, we normalized the 3D matrices (sender cell types × receiver cell types × neuroligand–receptor interactions) in each of the CellChat objects to have the same sum to account for batch effects across snRNA-seq samples. We then summed across the neuroligand–receptor interactions to reduce the 3D matrices into 2D CCC networks. A differential communication network can then be constructed through the subtraction of these two networks between PTSD and CON groups. Sender (ligand) communication based on gene expression levels was aggregated to calculate the overall sending output for each cell type. The difference in sending cell signalling expression was calculated by subtracting between PTSD and CON groups. Specific pathways, or slices of the 3D matrices, were taken to explore which neurotransmitter–receptor interactions contributed to the difference between PTSD and CON.

Peak calling on snATAC-seq clusters

Pseudobulk replicates were created for each of the seven cell types with the addGroupCoverages function in ArchR to resolve inherent sparsity in the snATAC-seq data. MACS2 (v2.2.9.1)⁷⁰ was then used to call peaks on each pseudobulk replicate. ArchR uses an iterative overlap peak merging procedure with the addReproduciblePeakSet function to create a final peak set of fixed-width peaks using the following procedure: (1) peaks were first ranked by their significance, (2) the most significant peak was retained and any peak that directly overlapped with the most significant peak was removed from further analysis, and (3) of the remaining peaks, this process was repeated until no more peaks existed. ArchR analyses all the pseudobulk replicates from a single-cell type together, performing the first iteration of iterative overlap removal, then checks to see the reproducibility of each peak across pseudobulk replicates and only keeps peaks that pass a threshold indicated by the reproducibility parameter, resulting in a single merged peak set for each cell type. This procedure was repeated to merge the cell-type-specific peak sets by renormalizing the peak significance across the different cell types and performing the iterative overlap removal, resulting in a single merged peak set of fixed-width (501 bp) peaks, denoted the union peaks. We utilized both cell-type-specific peaks and the union peaks depending on the analysis that we performed. The addPeakMatrix function adds the PeakMatrix using the union peaks to the data object, which was used for downstream analyses such as adding peak-to-gene links and performing transcription factor motif analysis. To assess the quality of the peak calling procedure, we overlapped our cell-type-specific snATAC-seq peaks with the bulk adult bCREs, which were derived from adult brain tissues to study the regulatory landscape of the human brain⁷¹. From each of the 96 DNase-seq experiments, V4 cCREs (candidate cis-regulatory elements) with a Z-score > 1.64 were selected. V4 cCREs were obtained directly from ENCODE (https://screen.encodeproject.org)⁴⁰. Subsequently, cCREs with limited experimental support were removed and only those elements that were supported by at least five experiments were retained, which resulted in a collection of 253,638 adult bCREs. We used bedtools intersect (v2.30.0) with the minimum overlap fraction (-f), ranging from 0.2 to 1.0 to calculate the percentage of overlap.

snRNA-seq and snATAC-seq integration

We integrated our snATAC-seq with a subset of our snRNA-seq data (six high-quality samples that displayed high heterogeneity of cell types) using the addGeneIntegrationMatrix function. We used a constrained integration approach, which constrains the cells in the snRNA-seq data to the best-matched cells in the snATAC-seq data for the seven cell types. Once we had the GeneIntegrationMatrix, we identified peak-to-gene links using the addPeak2GeneLinks function, which leverages integrated snRNA-seq data to look for correlations between peak accessibility and gene expression. We set maxDist = 2 MB to consider long-range interactions. We used the plotPeak2GeneHeatmap function with k-means clusters = 20 to plot the side-by-side heatmaps of linked ATAC and gene regions.

Characterization of cell-type-specific CREs

Once we had the peak-to-gene links, we used thresholds correlation > 0.4 and FDR < 0.05 to identify peaks likely to regulate gene expression. The 395,932 cell-type-specific CREs, which are the unique peaks from this subset, are the backbone of all ATAC-related analyses. We identified CLGs and overlapped them with the cell-type-specific DEGs to identify CLG–DEGs for each cell type. These CREs were also used in the GWAS analyses to assess LDSC trait heritability and fine-map causal variants at PTSD risk loci in a cell-type-specific manner.

Characterization of condition-specific CREs

To investigate epigenomic differences between PTSD and MDD, we curated a set of condition-specific CREs in our ATAC data. First, we called peaks on each cell-type condition group (for example, EXN CON, EXN MDD and EXN PTSD) in the same manner as was done on the seven cell types. We identified unique peaks for each condition using bedtools intersect. After adding peak-to-gene links and identifying CREs that were likely to regulate gene expression, we identified a total of 141,939 condition-specific CREs across all cell-type condition groups. We then identified PTSD CLG–DEGs by overlapping the PTSD-specific CRE-linked genes and PTSD DEGs and MDD CLG–DEGs by overlapping the MDD-specific CRE-linked-genes and MDD DEGs for each cell type. Globally, we found 383 PTSD-specific CLG–DEGs and 1,063 MDD-specific CLG–DEGs that were specific to each condition.

snMultiome ATAC analysis

As we achieved subtype resolution in the neuronal clusters with the snMultiome data, we were able to call peaks on the seven cell types and 14 subtypes. We compared these peaks to the bCREs to assess peak calling quality and also compared them with the cell-type-specific ATAC peaks. The peak overlap ratio between ATAC and Multiome was calculated using bedtools intersect with a minimum overlap of one nucleotide. The ratio was then determined by dividing the number of intersecting peaks by the total number of ATAC peaks. With subtype resolution of neurons, we were able to identify the specific EXN or IN subtype that was most enriched in a particular trait using LDSC.

Transcription factor motif enrichment analysis and footprinting

Transcription factor motif enrichments were calculated to predict which regulatory factors were most active in a given cell type. The addMotifAnnotations function in ArchR indicates motif presence (P < 5 × 10⁻⁵) in the union peaks using a binary matrix, and the peakAnnoEnrichment function tests either cell-type-specific marker peaks or differential peaks for enrichment of various motifs. Besides the previous transcription factor motif enrichments, the R package chromVAR predicts enrichment of transcription factor activity on a per-cell basis while controlling for known technical biases. The addDeviationsMatrix function, which adds the MotifMatrix to the data object, was used to compute deviation z-scores for each motif by comparing the number of fragments that map to peaks containing the motif to the expected number of fragments in a background peak set that accounts for confounding factors such as GC content bias, PCR amplification and Tn5 tagmentation. Then, we used the getPositions, addGroupCoverages and getFootprints functions to perform transcription factor footprinting analysis with pseudobulk aggregates of cells in the same cell type. In the plotFootprints function, ArchR performs normalization by subtracting the Tn5 bias from the footprinting signal. We plotted the footprinting results for the transcription factors EGR2 and TFAP4 comparing PTSD and CON in EXNs and INs, respectively.

Transcription factor gene-regulatory network construction

Transcription factor–CRE–gene links were created for each cell type with DEG (up or down) and PTSD GWAS gene (true or false) status indicated for each gene. We created GRNs (gene regulatory networks) using EXN transcription factor–CRE–gene links and IN transcription factor–CRE–gene links using three highly enriched transcription factors in each cell type. We used the R package igraph (v1.2.6) to plot the GRN, showing transcription factor-to-target gene links and overlaying additional information such as whether the target gene is a DEG or PTSD GWAS gene. For the GRNs, we used correlation cut-offs of 0.7 and 0.6 for EXNs and INs, respectively, for visualization purposes. For the EXN GRN, we indicated all genes that overlapped with the iPS cell DEGs for EGR2 and SMARCC1 in pink. Across all transcription factor–CRE–gene linkage analyses, we used the FDR value of the transcription factor–CRE linkage as a way to test multiple comparisons and indicate statistical significance.

Estimating GWAS enrichment using ATAC and Multiome peaks

To estimate heritability of various complex traits, we used LDSC (v1.0.1)⁷². GWAS summary statistics for PTSD-related traits (tota lPCL, case–control, re-experiencing, avoidance and hyperarousal), other psychiatric traits (schizophrenia, MVP depression, Alzheimer’s disease and alcoholism) and non-psychiatric traits (irritable bowel disease, diabetes, height and education) were correctly formatted with munge_sumstats.py and lifted to hg38 coordinates using UCSC liftover. Cell-type-specific peaks were formatted for LDSC using make_annotation.py, and linkage disequilibruim scores were computed for each set using ldsc.py. Benjamini–Hochberg multiple-testing correction was applied to the enrichment P values. We compared the results using single-cell peaks from snATAC-seq or snMultiome with the bulk bCREs, and saw enhanced enrichment using single-cell data.

GWAS fine-mapping

For PTSD GWAS loci screening, we first identified risk loci from two previous PTSD GWAS^5,7. We selected a neighbourhood of around 1–2 Mb that included the MVP-lead SNP. After fine-mapping these loci using the total PCL and re-experiencing summary statistics, we identified eight that fell within cell-type-specific EXN and IN CREs. For peak enrichment scores as prior weights, we performed genetic fine-mapping with the sum of single-effect (SuSiE, v0.12.35) regression model⁷³ to infer risk variants with epigenetic information from our snATAC peaks, such that the prior can improve the accuracy of detecting causal variants⁷⁴. For SNPs within ATAC peaks, we used the peak enrichment scores (−log₁₀P), which indicate the strength of each peak, as the prior weights for SuSiE to prioritize SNPs with strong ATAC signals. The enrichment scores in the union peakset ranged from 1 to 507 with a mean of 19.7 and standard deviation of 28.3. To reduce the effects of highly enriched peaks, we set the enrichment score values above the 95th percentile to the value of the 95th percentile (67.49) and rescaled the scores to 1–100. For SNPs outside ATAC peaks, we set the prior weights to 0.1. We used CREs (peak-to-gene correlation > 0.4 and FDR < 0.05) to perform fine-mapping in SuSiE. For fine-mapping with SuSie, we performed SuSiE regression with the susie_rss function using the PTSD GWAS summary statistics (total PCL and re-experiencing)⁵. SuSiE calculates the PIP, the probability for a given SNP being causally associated with the trait of interest, for each variant. The linkage disequilibrium information was derived from the 503 European samples of the 1000 Genomes Project (1KG). For input files, we used the 1KG Phase 3 dataset⁷⁴ as the reference panel for our analysis. This dataset can be accessed via the Resources section on the PLINK 2.0 website (Resources — PLINK 2.0 (cog-genomics.org)). We selected the European samples based on the super-population information provided by 1KG and excluded all duplicate and ambiguous SNPs. We applied quality control to the 1KG data with European ancestry using PLINK⁷⁵. To be more specific, we only included SNPs with a 99% genotyping rate and the minor allele frequency no less than 0.005. We also excluded samples with more than 5% missing genotypes and markers that failed to pass the Hardy–Weinberg test. The final reference panel included 503 non-overlapping European samples, genotyped at 8,190,311 SNPs. To obtain the coordinates for risk SNPs, we turned to the UCSC Genome Browser⁷⁶ and focused on the human genome version GRCh38/hg38. For fine-mapping figures, after running SuSie, we integrated all the findings including the ATAC signal track, the CRE–gene links that link to the TSS of the risk gene, the PIP output values from SuSie, correlation of each variant to the lead SNP, and the gene track into the final fine-mapping figure using the predetermined regions. We plotted all variants with a PIP > 0.0001, highlighted CRE–gene loops for variants with PIP > 0.05, and outlined each variant in black if it fell within a CRE. The top two credible SNPs and the MVP SNP were labelled. The lead SNP was coloured red, and the MVP SNP was coloured purple. The CRE–gene loops were colored by their CRE–gene correlation values and by cell type (IN in green, and EXN in red). For HiC TAD, for ELFN1, KCNIP4, EGR3 and LRFN5, contact domains were found from the NeuN⁺ Hi-C file⁷⁷ using the Juicer (v1.6)⁷⁸ arrowhead command with parameters r = 10,000 (the resolution of the sequencing depth of the Hi-C file) and m = 1,000 (the size of the sliding window along the diagonal in which contact domains will be found). The TADs highlighted the regions of chromosomal looping in the fine-mapping figures.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.