Friday, December 19, 2025
No menu items!
HomeNatureSpatiotemporal cellular map of the developing human reproductive tract

Spatiotemporal cellular map of the developing human reproductive tract

Samples

Fetuses were obtained after voluntary terminations of pregnancy, which were performed either via medical or surgical procedures. The termination methods used did not compromise the integrity or morphology of the fetuses analysed in this study. Only well-preserved fetuses, without evidence of structural damage, were included. All tissue samples used for this study were obtained with written informed consent from all participants in accordance with the guidelines in The Declaration of Helsinki 2000. The human embryonic and fetal material was provided by the Joint MRC–Wellcome Trust (grant number MR/R006237/1 and MR/X008304/1) Human Developmental Biology Resource (HDBR, http://www.hdbr.org), with appropriate maternal written consent and approval from the Fulham Research Ethics Committee (REC reference 18/LO/0822 and 23-LO/0312) and Newcastle & North Tyneside 1 Research Ethics Committee (REC reference 18/NE/0290). The HDBR is regulated by the UK Human Tissue Authority (HTA; www.hta.gov.uk) and operates in accordance with the relevant HTA Codes of Practice. This research was also supported by the NIHR Cambridge Biomedical Research Centre (NIHR203312). The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care.

Assignment of developmental stage

Embryos up to 8 PCW were staged using the Carnegie staging method62. At stages beyond 8 PCW, age was estimated from measurements of foot length and heel-to-knee length and compared with the standard growth chart63. A piece of skin or, where this was not possible, chorionic villi tissue was collected from every sample for quantitative PCR analyses using markers for the sex chromosomes and the autosomes 13, 15, 16, 18, 21 and 22, which are the most commonly seen chromosomal abnormalities. All samples were karyotypically normal.

Tissue processing

All tissues for sequencing and spatial work were collected in HypoThermosol FRS Preservation solution (Sigma-Aldrich) and stored at 4 °C until processing. Tissue dissociation was conducted within 24 h of tissue retrieval with the exception of tissues that were cryopreserved and stored at −80 °C (Supplementary Table 1).

We used a previously described protocol optimized for gonadal dissociation and available at protocols.io64. In brief, tissues were cut into <1 mm3 segments before digestion with a mix of trypsin–EDTA 0.25% and DNase I (0.1 mg ml–1) for 5–15 min at 37 °C with intermittent shaking. Samples >17 PCW were digested using a combination of collagenase and trypsin–EDTA using a previously described protocol64,65, but with modifications. In brief, samples were first digested with a mix of collagenase 1A (1 mg ml–1), DNase I (0.1 mg ml–1) and Liberase TM (50 µg ml–1) for 45 min at 37 °C with rotation. The cell solution was further digested with trypsin–EDTA 0.25% for 10 min at 37 °C with rotation. In both protocols, digested tissue was passed through a 100 µm filter and cells were collected by centrifugation (500g for 5 min at 4 °C). Cells were washed and resuspended in PBS–BSA 0.04% before cell counting.

Single-nucleus suspension

Single-nucleus suspensions were isolated from dissociated cells when performing scATAC–seq, following the manufacturers’ instructions, and from frozen tissue sections when performing multi-omic snRNA-seq and scATAC–seq. For the latter, thick (300 µm) sections were cryosectioned and kept in a tube on dry ice until subsequent processing. Nuclei were released by Dounce homogenization as described in detail in the methods at protocols.io (https://doi.org/10.17504/protocols.io.bp2l6n1xkgqe/v1).

Tissue cryopreservation

Fresh tissue was cut into <1 mm3 segments before being resuspended with 1 ml ice-cold Cryostor solution (CS10, C2874-Sigma). Tissue was frozen at −80 °C, decreasing the temperature approximately 1 °C min–1. A detailed protocol is available at protocols.io (https://doi.org/10.17504/protocols.io.bgsnjwde).

Tissue freezing

Fresh tissue samples of the human developing reproductive tract were embedded in cold OCT medium and flash-frozen using a dry ice–isopentane slurry.

H&E staining and imaging

Fresh-frozen sections were removed from −80 °C storage and air dried before being fixed in 10% neutral-buffered formalin for 5 min. After rinsing with deionized water, slides were dipped in Mayer’s haematoxylin solution (QPath) for 90 s. Slides were completely rinsed in 4–5 washes of deionized water, which also served to blue the haematoxylin. Aqueous eosin 1% (Leica) was manually applied onto sections with a pipette and rinsed with deionized water after 1–3 s. Slides were dehydrated through an ethanol series (70%, 70%, 100% and 100%) and cleared twice in 100% xylene. Slides were coverslipped and allowed to air dry before being imaged on a Hamamatsu Nanozoomer 2.0HT digital slide scanner.

Multiplexed smFISH and high-resolution imaging

Large-tissue section staining and fluorescence imaging were conducted largely as previously described66. Sections were cut from fresh-frozen or fixed-frozen samples embedded in OCT at a thickness of 10 μm using a cryostat, placed onto SuperFrost Plus slides (VWR) and stored at −80 °C until stained. Tissue sections were then processed using a Leica BOND RX to automate staining with a RNAscope Multiplex Fluorescent Reagent kit v2 assay (Advanced Cell Diagnostics, Bio-Techne) according to the manufacturers’ instructions. Details of the probes used are provided in Supplementary Table 3. Before staining, human fresh-frozen sections were post-fixed in 4% paraformaldehyde in PBS for 15 min at 4 °C, then dehydrated through a series of 50%, 70%, 100% and 100% ethanol for 5 min each. Following manual pretreatment, automated processing included epitope retrieval by protease digestion with protease IV for 30 min before probe hybridization. Subsequently, the automated processing for these sections included heat-induced epitope retrieval at 95 °C for 5 min in buffer ER2 and digestion with protease III for 15 min before probe hybridization. Tyramide signal amplification with Opal 520, Opal 570 and Opal 650 (Akoya Biosciences) and TSA-biotin (TSA Plus Biotin kit, Perkin Elmer) and streptavidin-conjugated Atto 425 (Sigma Aldrich) was used to develop RNAscope probe channels.

Stained sections were imaged with a Perkin Elmer Opera Phenix High-Content Screening system in confocal mode with 1-μm z step size, using a ×20 (NA 0.16, 0.299 μm pixel–1), ×40 (NA 1.1, 0.149 μm pixel–1) or ×63 (NA 1.15, 0.091 μm pixel–1) water-immersion objective. The following channels were used: DAPI, excitation (ex.) of 375 nm and emission (em.) of 435–480 nm; Atto 425, ex. of 425 nm and em. of 463–501 nm; Opal 520 ex. of 488 nm and em. of 500–550 nm; Opal 570, ex. of 561 nm and em. of 570–630 nm; and Opal 650, ex. of 640 nm and em. of 650–760 nm.

Image stitching

Confocal image stacks were stitched as two-dimensional maximum-intensity projections using proprietary Acapella scripts provided by Perkin Elmer.

10x Genomics Chromium GEX library preparation and sequencing

For the scRNA-seq experiments, cells were loaded according to the manufacturer’s protocol for the Chromium Next GEM Single Cell 5′ v2 (DUAL) kit (10x Genomics) to attain between 2,000 and 10,000 cells per reaction. Library preparation was carried out according to the manufacturer’s protocol. Libraries were sequenced, aiming at a minimum coverage of 20,000 raw reads per cell, on Illumina HiSeq 4000 or Novaseq 6000 systems using the following sequencing format: read 1, 26 cycles; i7 index, 8 cycles; i5 index, 0 cycles; read 2, 98 cycles.

For the scATAC–seq and multimodal snRNA-seq and scATAC–seq experiments, cells were loaded according to the manufacturer’s protocol for the Chromium Single Cell ATAC v2 and Chromium Next GEM Single Cell Multiome ATAC+Gene Expression (10x Genomics) to attain between 2,000 and 10,000 cells per well. Library preparation was carried out according to the manufacturer’s protocol. Libraries for scATAC–seq were sequenced on an Illumina NovaSeq 6000 system, aiming at a minimum coverage of 10,000 fragments per cell, with the following sequencing format: read 1, 50 cycles; i7 index, 8 cycles; i5 index, 16 cycles; read 2, 50 cycles.

10x Genomics Visium library preparation and sequencing

Cryosections (10 µm) were cut and placed on Visium slides. These were processed according to the manufacturer’s instructions. In brief, sections were fixed with cold methanol, stained with H&E and imaged on a Hamamatsu NanoZoomer 2.0HT before permeabilization (24–30 min), reverse transcription and cDNA synthesis using a template-switching protocol. Second-strand cDNA was liberated from the slide and single-indexed libraries were prepared using a 10x Genomics PCR-based protocol. Libraries were sequenced (1 per lane on a HiSeq4000), aiming for 300 million raw reads per sample, with the following sequencing format: read 1, 28 cycles; i7 index, 8 cycles; i5 index, 0 cycles; read 2, 91 cycles.

10x Genomics Visium CytAssist library preparation and sequencing

Cryosections (10 µm) were collected onto SuperFrost Plus slides (VWR) and processed according to the 10x CytAssist protocol (CG000614 and CG000495). In brief, sections were fixed in methanol, H&E stained and imaged on a Hamamatsu Nanozoomer 2.0HT. After destaining, human whole transcriptome probe pairs were hybridized and ligated to the tissue RNA. The ligation products were then released and captured onto Visium slides using a CytAssist instrument. The probes were then extended to incorporate the spatial barcodes from the Visium slide, eluted and prepared into a dual-indexed library. Libraries were sequenced (4 samples per Illumina Novaseq SP flow cell) aiming for a minimum 25,000 read pairs per spot, with the following sequencing format: read 1, 28 cycles; i7 index, 10 cycles; i5 index, 10 cycles; read 2S, 90 cycles.

Customized ISS pipeline

ISS was performed using a 10x Genomics CARTANA HS Library Preparation kit (1110-02, following protocol D025) and an In Situ Sequencing kit (3110-02, following protocol D100), which comprise a commercialized version of HybISS67. A reproductive tract section was fixed in 3.7% formaldehyde (Merck 252549) in PBS for 30 min, washed twice in PBS for 1 min each, permeabilized in 0.1 M HCl (Fisher 10325710) for 5 min and washed twice again in PBS, all at room temperature. Following dehydration in 70% and 100% ethanol for 2 min each, a 50, 100 or 150 μl volume (depending on the size of the section) SecureSeal hybridization chamber (Grace Bio-Labs GBL621505-20EA) was adhered to the slide and used to hold subsequent reaction mixtures. Following rehydration in buffer WB3, probe hybridization in buffer RM1 was conducted for 16 h at 37 °C. The 171-plex probe panel included 5 padlock probes per gene, the sequences of which are proprietary (10x Genomics CARTANA). The section was washed with PBST (PBS with 0.05% Tween-20) twice, then with buffer WB4 for 30 min at 37 °C, and thrice again with PBST. Probe ligation in RM2 was conducted for 2 h at 37 °C and the section washed thrice with PBST, then rolling circle amplification in RM3 was conducted for 18 h at 30 °C. Following PBST washes, all rolling circle products (RCPs) were hybridized with LM (Cy5 labelling mix with DAPI) for 30 min at room temperature, the section was washed with PBST and dehydrated with 70% and 100% ethanol. The hybridization chamber was removed and the slide mounted with SlowFade Gold Antifade mountant (Thermo, S36937).

Imaging of Cy5-labelled RCPs at this stage acted as a quality-control step to confirm RCP (‘anchor’) generation and served to identify spots during decoding. Imaging was conducted using a Perkin Elmer Opera Phenix Plus High-Content Screening system in confocal mode with 1-μm z step size using a ×63 (NA 1.15, 0.097 μm pixel–1) water-immersion objective and 7% overlap between adjacent tiles. The following channels were used: DAPI, ex. of 375 nm and em. of 435–480 nm); Atto 425, ex. of 425 nm and em. of 463–501 nm; Alexa Fluor 488, ex. of 488 nm and em. of 500–550 nm; Cy3, ex. of 561 nm and em. 570–630 nm; and Cy5, ex. of 640 nm and em. of 650–760 nm.

Following imaging, the slide was de-coverslipped vertically in PBS (gently, with minimal agitation such that the coverslip fell off to prevent damage to the tissue). The section was dehydrated with 70% and 100% ethanol, and a new hybridization chamber was secured to the slide. The previous cycle was stripped using 100% formamide (Thermo AM9342), which was applied fresh each minute for 5 min, then washed with PBST. Barcode labelling was conducted using two rounds of hybridization: first, with an adapter probe pool (AP mixes AP1–AP6, in subsequent cycles), then a sequencing pool (SP mix with DAPI, customized with Atto 425 in place of Alexa Fluor 750), each for 1 h at 37 °C with PBST washes in between and after. The section was dehydrated, the chamber removed and the slide mounted and imaged as described above. This process was repeated another five times to generate the full dataset of seven cycles (anchor and six barcode bits).

Derivation and maintenance of fetal uterine organoids

Fetal uterine organoids were derived from 12 PCW (Hrv276-ORG) and 17 PCW (Hrv277-ORG) fetal reproductive tract samples, (developing uterus, cervix and vagina) following tissue dissociation as described above. The single-cell suspensions were washed in Advanced DMEM/F12 (Gibco, 12634-010), centrifuged and the cell pellet resuspended in Matrigel (Corning, 356231) at around a 1:3 ratio (pellet volume to Matrigel volume). The organoids were cultured as previously described68, forming in 25 µl Matrigel domes in 48-well tissue treated plates covered by 250 µl basal endometrial organoid medium (Advanced DMEM/F12 (Gibco, 12634-010), 1% GlutaMAX (Gibco, 35050061), 1% insulin–transferrin–selenium (ITS-G) (Gibco, 41400045), 100 µg ml–1 primocin (Invivogen, ant-pm-1), 1× B27-vitamin A (Life Technologies, 12587010), 1× N2 (Life Technologies, 17502048), 1.25 mM N-acetyl-l-cysteine (Sigma-Aldrich, A9165-5G), 2 mM nicotinamide (Sigma, N0636-100G), 2 ng ml–1 recombinant human FGF-basic (154 amino acids) (Peprotech, 100-18B), 500 ng ml–1 recombinant human R-spondin-1 (R&D Systems, 4645-RS-01M/CF), 10 µM SB202190 (p38i) (StemCell Technologies, 72632), 500 nM A83-01 (Tocris, 2939), 50 ng ml–1 recombinant human EGF (Peprotech, AF-100-15), 10 ng ml–1 recombinant human FGF-10 (Peprotech, 100-26) and 100 ng ml–1 recombinant human Noggin (Peprotech, 120-10 C))68. The medium was supplemented with 10 µM of the ROCK inhibitor Y-27632 (Millipore, 688000) for the first 2 days of organoid line establishment, with full medium changes every 2–3 days.

All organoid lines were split and passaged approximately every 5–7 days depending on their size and density. TrypLE Express Enzyme (Gibco, 12604013) was added to each well, and domes were detached with either a 1,000 µl tip or cell scraper before being transferred to a 15 ml Falcon tube. The organoids were dissociated into cell clumps by forcefully pipetting the solution 15–30 times using a 1,000 µl tip, followed by incubation at 37 °C for 6–8 min. Advanced DMEM/F12 was added at 1:1 ratio to quench TrypLE Express Enzyme and pipetted up and down 10 more times with the 1,000 µl tip. Cell suspensions were centrifuged at 800g for 2 min at 4 °C. The supernatant was removed as close to the pellet as possible. Next, 30 µl cold Matrigel per desired dome were added and the pellet was slowly resuspended to evenly distribute the cells. A volume of 30 µl domes was seeded into 6-well, 12-well or 24-well tissue culture-treated plates depending on whether the organoids were being expanded or set-up for drug treatment. The domes were placed in an incubator for 10 min at 37 °C, followed by the addition of basal endometrial organoid medium supplemented with 10 µM Y-27632.

Treatment of fetal uterine organoids with BPA or BPP

For all drug treatment experiments, organoids were passaged 48 h before addition of the compound. Organoids were plated in 30 µl domes as described above in two technical replicates, each containing two organoid domes. For testing the effect of endocrine disruptors on fetal reproductive organoids, Hrv276-ORG and Hrv277-ORG lines were treated with either 10 μM of BPA (Sigma, 239658) or 100 μM BBP (Sigma, 308501), with controls receiving the same volume of DMSO as the compound administered. For the endocrine-disrupter experiments BPA, BBP or DMSO were added to basal endometrial organoid medium (as described above), but with phenol-red-free DMEM/F12 as the base medium (Merck, D6434). Organoids were treated with BPA or BBP for a total of 96 or 144 h, with full medium change every 48 h. After 96 or 144 h of drug treatment, organoids were dissociated into a single-cell suspension. In brief, organoids were collected and washed in ice-cold phenol-red-free DMEM/F12 (BPA, BBP or DMSO control) and centrifuged at 600g for 6 min at 4 °C. The supernatant was removed and replaced with TrypLE Express Enzyme at a ratio of 500 µl per 30 µl dome, and pipette-mixed with a p1,000 tip for 30 times. Organoid suspensions were incubated at 37 °C for 15–25 min, with manual shaking every 2 min. Cells were checked at 15 min then every 5 min until an adequate single-cell suspension was achieved of about 70% single cells. Once the cells were sufficiently digested, TrypLE was quenched with phenol-red-free DMEM/F12. Cells were re-centrifuged and medium was aspirated to leave around 50 µl of medium. The cell suspensions were then vigorously pipette-mixed with a p20 tip 30–60 times. To this suspension, 200 μl of 1% PBS–BSA was added, thoroughly mixed and passed through a 70 μm filter. Live cells were counted using Trypan blue. The cells were loaded into a 10x Genomics Chromium chip as described in the Chromium Next GEM Single Cell 5′ v2 (DUAL) kit.

Immunofluorescence of fetal uterine organoids

Fetal uterine organoids were grown and treated as described above in µ-Slide 8 Wells (ibidi, 80801). Organoids were fixed in 4% paraformaldehyde for 45 min at room temperature and washed several times in PBS. Cells were permeabilized and blocked for 2 h in 2% Triton-X + 5% FBS in PBST. Organoids were washed in PBS before incubation with primary antibodies. Antibodies were incubated in an antibody dilution buffer (0.25% Triton-X + 1% FBS in PBST) at 4 °C overnight. Organoids were stained with TRITC-conjugated phalloidin (Thermo Fisher Scientific, R37112, dilution according to the manufacturer’s instructions), Alexa 488-conjugated ZO-1 (Invitrogen, 339188; 1:200 dilution) and APC-conjugated EpCAM (BD biosciences, 347200; 1:200 dilution). Organoids were washed 3 times for 45 min each in PBS and then mounted in ibidi mounting medium (ibidi, 50011).

Organoids were imaged with a Perkin Elmer Opera Phenix High-Content Screening system in confocal mode with 10-μm z step size, using a ×20 (NA 0.16, 0.299 μm pixel–1) water-immersion objective. The following channels were use: DAPI, ex. of 375 nm and em. of 435–480 nm); Alexa 488, ex. of 488 nm and em. of 500–550 nm; TRITC, ex. of 561 nm and em. of 570–630 nm; and APC, ex. of 640 nm and em. of 650–760 nm.

Analysis of scRNA-seq data

Per-library analyses

For each sequenced scRNA-seq library, we performed read alignment to the human reference genome (GRCh38 2020-A), and mRNA quantification and initial quality control using STARsolo69 with default parameters. Ambient RNA contamination was inferred and subtracted from the original expression matrix using the deep generative model CellBender70. For multiplexed libraries, Souporcell71 was then applied to deconvolve the genotypes and to assign cells to their respective donors. Owing to the scarcity of human cell-type markers, each library was first analysed independently before integrating them to have a way of formally evaluating the quality of the integration. In brief, we used Scrublet72 for cell-doublet calling with a two-step diffusion doublet identification, as previously described73. Genes expressed by fewer than 3 cells were excluded, as were cells expressing fewer than 1,500 genes, more than 20% mitochondrial genes or with more than 40% of the scrublet score.

After converting the expression space to log [CPM/100 + 1], the anndata object was transposed to the gene space to identify cell cycling genes in a data-driven manner, as previously described73,74. Principal component analysis, neighbour identification and partition-based Leiden clustering75 were performed on the gene space, and then the members of the gene cluster, including known cycling genes (CDK1, MKI67, CCNB2 and PCNA), were flagged as the data-derived cell cycling genes and discarded in the downstream analysis. Back in the cell space, we identified highly variable genes, performed principal component analysis, computed the neighbourhood graph, Leiden clustering75 and UMAP76 for visualization in two dimensions. The per-library computational analysis workflow described so far was wrapped in a Nextflow77 pipeline with two processes to enable parallelization and reproducibility.

To identify genes characteristic of each cluster, we performed term frequency–inverse document frequency, a method borrowed from natural-language processing that reflects how important a word (gene) is to a document (cluster) in a corpus (dataset), as implemented in the R library SoupX78. Annotations were only finalized when analysing spatially resolved transcriptomics data (both 10x Visium and ISS). A detailed explanation of the cell types identified alongside their marker genes is provided in Supplementary Note 1.

Integration of scRNA-seq libraries

After having annotated each sample separately and realizing the significant differences in cell-type composition across samples, we generated three integrated views that best preserved the biological heterogeneity of this system: ≤10 PCW male and female samples together (when the sexually unmatched ducts are still present and the first signs of regionalization of the ducts appear); >10 PCW male samples; >10 PCW female samples. The variational autoencoder-based method scVI79, trained on the 7,500 most highly variable genes and with 30 latent variables and 2 hidden layers, was then applied to mitigate batch effects across donors in each of the three views. To evaluate the integrated manifold, we then overlaid the per-sample annotations and confirmed that the biological signal was preserved while correcting for donor-specific effects. Moreover, for a more quantitative evaluation of the integration results, we computed the Shannon entropy per Leiden cluster of the per-sample cell-type annotations as well as the donor and sex labels, following a previously described method80. Clusters with a donor label entropy equal to 0 (that is, donor-specific clusters) were removed from further analysis. Each cluster was then annotated on the basis of majority voting (≥40%) of the cell-type labels. Clusters showing high cell-type label entropy (that is, <40% of cells in the cluster having the same cell-type label) were further investigated and annotated according to their most expressed term frequency–inverse document frequency markers.

Finally, for visualization purposes only, all libraries were integrated with scVI (7,500 highly variable genes, 60 latent variables), and cell-type labels were overlaid from the per-view annotations. Variations in cell-type proportions across developmental time were visualized with an area plot.

Per-organ analyses

The cellular and molecular features of the establishment of sexual dimorphism in each organ of the developing human reproductive tract were investigated by performing subanalyses on the following relevant cell types:

  • All cell types in male and female external genitalia during the MPW (8–14 PCW)

  • Coelomic epithelium, Müllerian duct epithelium and mesenchyme during the period of Müllerian duct emergence (6–8 PCW)

  • Mesenchymal and epithelial cells from the differentiating Müllerian and Wolffian ducts (>10 PCW), resulting in four subanalyses (Müllerian-derived mesenchyme, Müllerian-derived epithelium, Wolffian-derived mesenchyme and Wolffian-derived epithelium).

In all these per-organ analyses, preprocessing was carried out analogously to what is described in the per-library analysis section, and Harmony81 (theta = 0) was used to correct for batch effects.

Human–mouse comparison of external genitalia

We leveraged the availability of an annotated scRNA-seq dataset of mouse genital tubercle (comprising two male and female samples for each of three developmental stages: embryonic day 14.5 (E14.5), E16.6 and and E18.5)23 to identify the transcriptional regulators underpinning sexual dimorphism in the corpus spongiosum in each species. To define a shared feature space, we first took the set of orthologous genes expressed in at least ten cells in both species. From the set of common orthologous genes, we then computed the top 4,000 highly variable genes in each species and identified their intersection (around 2,700 genes). Using the intersection of highly variable genes, batch effects owing to different donors or mice were corrected by means of Harmony81, and a Milo82 object was computed on each species’ batch-corrected embedding. Each neighbourhood from a species was matched to its k (k = 30) closest neighbourhoods in the other species (in both the mouse-to-human and human-to-mouse directions). The final matches were formed by identifying the mutually nearest pairs of neighbourhoods that appeared in both directions83. Each matched neighbourhood was then annotated by majority voting of the labels of the cells in the neighbourhood. Matching across cell-type labels was visualized with an alluvial plot. We then considered the unique combination of matching cell-type labels as the harmonized cell-type annotations across species and focused on the mouse cells that matched the human corpus spongiosum label for differential expression analysis between male and female individuals.

Differential expression analysis in the human and mouse genital tubercle

We conducted differential expression analysis between male and female samples in the human and mouse corpus spongiosum and human urethral epithelium using PyDESeq284. Only samples between 8 and 14 PCW (the so-called MPW) were included in the analysis. Genes mapping to the Y chromosome were excluded from differential expression testing. Results of the differential expression analysis were visualized with a Volcano plot showing the genes with |log[FC > 1]| and adjusted P < 0.05.

Cell–cell interaction analysis from scRNA-seq data

Sexually dimorphic cell–cell interactions between the mesenchymal corpus spongiosum and the urethral epithelium during the MPW (8–14 PCW) were inferred using CellphoneDB (v.5.1)85 using method 3 (differential expression analysis). After splitting both cell clusters into male and female, differentially expressed genes for each cell type–sex combination were identified using the FindAllMarkers() function in Seurat86, with only positive markers being considered. Only genes expressed in at least 10% of the cells in a cell type–sex combination were considered for this analysis. The search for interaction was restricted to cell types in the same sex (which was passed as input to the method in the ‘microenvironments’ file).

Wolffian-to-Müllerian duct signalling during the developmental time window of Müllerian duct emergence (6–8 PCW) was also explored with CellphoneDB (v.5.1)85 using method 3. Differentially expressed genes per cell type (that is, Wolffian mesenchyme, Wolffian epithelium, Müllerian mesenchyme and Müllerian epithelium) were similarly computed using the FindAllMarkers() function in Seurat86 (with only positive markers being considered), and only genes expressed in at least 10% of the cells in a cell type were considered for this analysis. All cell types belong to the same ‘microenvironment’.

Trajectory inference and differential expression along trajectories

The trajectory inference method Slingshot87 was applied to recover the lineages originating from the coelomic epithelium during Müllerian duct emergence (6–8 PCW). The pseudotime ordering of the cells along with the weighted assignment of each cell to the three lineages were then used as input for TradeSeq88 to extract genes that were differentially expressed along a lineage or between lineages with the associationTest() function.

Inference of clinically approved drugs potentially disrupting reproductive epithelia

Using drug2cell89, we derived drug scores for compounds in the CHEMBL database by averaging the expression levels of target genes in each cell on the three views of the dataset independently. We then performed a Wilcoxon rank-sum test to identify significant differences in drug scores between each reproductive epithelium (that is, Müllerian duct epithelium, Wolffian duct epithelium, urogenital sinus epithelium and urethral epithelium in <10 PCW female and male embryos; fallopian tube epithelium, uterocervix epithelium, upper vagina epithelium, vaginal plate epithelium and urethral epithelium in >10 PCW female fetuses; epididymis epithelium, vas deferens epithelium, prostate epithelium and urethral epithelium in >10 PCW fetuses) and all other reproductive-specific cells in the dataset. Results were filtered based on adjusted P values (<0.01), log fold changes (>2) and rank scores to select the most significant drugs associated with the target cell type. An additional filtering step was performed to exclude drugs for which target genes were not specific to the target cell type and required that the targets were expressed in at least 10% of cells in the target cell type.

In vivo–in vitro comparison

In vivo epithelial cells from the uterovaginal canal were used to train a CellTypist90 model with the top 200 genes per cell type (as ranked by their absolute regression coefficients associated with each cell type) as features. The trained model was then used to predict the cell-type labels in the untreated organoids, and the predicted probabilities were visualized using a dot plot.

To assess changes in cellular abundance following perturbation, Milo82 was used to construct a k-nearest neighbour (kNN) graph on embeddings integrated using Harmony81 for untreated versus BPA-treated and untreated versus BBP-treated organoids. Differential abundance testing was performed by assigning cells to neighbourhoods and applying a generalized linear model to compare the proportion of BPA-treated or BBP-treated cells relative to untreated controls, accounting for differences in cellular sampling. Differential expression analysis was conducted by comparing transcriptomic profiles of cells in differentially abundant neighbourhoods to all remaining neighbourhoods in each dataset. Genes with |log[FC]| > 0.5 and adjusted P < 0.05 were considered significant, and results were visualized using a volcano plot.

Gene set enrichment analysis was performed using EnrichR91 to identify biological pathways associated with BPA-induced or BBP-induced gene expression changes. Upregulated genes were compared against the MSigDB Hallmark 202092 gene sets, and significantly enriched pathways (adjusted P < 0.05) were visualized using a bar plot.

Analysis of scATAC–seq data

Per-library analyses

scATAC–seq libraries were processed (read filtering, alignment, barcode counting and cell calling) with 10x Genomics Cell Ranger ATAC pipeline v.2 using the pre-built 10x GRCh38 genome (v.3.1.0) as a reference. Similar to RNA, we analysed each ATAC library independently until cell-type annotation to evaluate the quality of the subsequent integrations using the ArchR framework93. Cells with fewer than 3,500 fragments or a minimum transcription start site enrichment of 10 were filtered out, as were cells deemed as doublets. Dimensionality reduction on the tile matrix was performed with Latent Semantic Indexing, and the low-dimensional variables were then used to compute the neighbourhood graph, partition-based Leiden clustering75 and UMAP visualization76.

To annotate cells, we used canonical correlation analysis to match the gene activity score matrix of each scATAC–seq library with the integrated gene-expression space of the corresponding view (for example, if the scATAC–seq library came from a female sample older than 10 PCW, we used the >10 PCW female-integrated scRNA-seq dataset). For optimal reproducibility and parallelization, the per-sample scATAC–seq analyses were also wrapped in a Nextflow script.

Integration of scATAC–seq libraries

The individually annotated samples were then integrated by re-computing latent semantic indexing on the concatenated tile matrices and using mutual nearest neighbours83 to correct for batch effects. Mutual nearest neighbours has proven to be highly effective for scATAC–seq (for which we do not have as many samples as scRNA–seq), as it enables us to specify the order of integration and hence mitigate batch effects in a more biologically informed fashion.

Integrative analysis of scRNA-seq and scATAC–seq data

Combining information from transcriptomics and chromatin accessibility data enables the prioritization of transcription factors that are likely to be active in each cell type, along with the identification of putative regulatory relationships between transcription factors and target genes. We sought to do this when investigating the regulatory landscape underlying the process of urethral canalization in males, Müllerian duct emergence from the coelomic epithelium, and patterning of the mesenchyme and epithelium during Müllerian and Wolffian duct differentiation.

Using the fragment files and cell annotations obtained through label transfer from the scRNA-seq data, pseudobulks were generated per cell type. Cell-type-specific peaks were called using MACS294 as implemented in the pycistopic workflow95. A set of consensus peaks was derived through iterative overlapping, and the resulting matrix of cells by consensus peaks was used as input to topic modelling with latent dirichlet allocation. Latent dirichlet allocation models were selected according to the metrics provided in pycistopic. Harmony81 was used to correct for the donor effect. The manifold was clustered with the Leiden algorithm75 and differentially accessible regions were computed between the Leiden clusters. The union of the differentially accessible regions and the regions contained in topics (obtained by binarizing the region-topic probabilities) served as the set of regions used to find transcription factor-binding motifs with pycistarget95.

To infer enhancer-driven gene regulatory networks, scRNA-seq and scATAC–seq data were eventually combined into a pseudo-multiome dataset using SCENIC+95. In essence, SCENIC+ generates metacells that contain cells from both data modalities by randomly sampling cells of the same cell-type label. Within a 150-kb region around each gene, gradient-boosting machines and correlation analysis were used to infer the relationships between enhancers and genes, as well as between transcription factors and genes. This approach enabled the identification of enhancers that are associated with the regulation of specific target genes and the inference of transcription factors that potentially contribute to the regulation of these genes.

Analysis of 10x Visium data

Per-library analyses

Visium data consist of FASTQ sequencing files and a bright-field microscopy image stained with H&E per capture area. The Space Ranger (v.2.0.0) software provided by 10x Genomics was used to align the barcoded spot pattern to the H&E tissue image and to differentiate tissue from background. The resulting spot-by-transcript abundance matrix was analysed using the package squidpy.

Estimation of cell-type abundances per Visium spot using matched scRNA-seq data

To deconvolve the transcriptional signal coming from each Visium spot into an estimated abundance of each cell type present in the >10 PCW female and male views of the scRNA-seq dataset, we applied the Bayesian model cell2location96. In brief, cell2location first estimates reference cell-type signatures from the dissociated scRNA-seq data using negative binomial regression. It then decomposes the spatially resolved Visium RNA count matrices into the reference cell-type signatures.

Annotation of anatomical and histological features

We used the microscopy H&E images to annotate anatomical structures and histological features independently of gene expression. Using the package TissueTag33, anatomical features were manually labelled, whereas histological features were inferred using a random forest classifier trained on a few manually labelled points. Estimated cell-type abundances per spot in each anatomical structure (or histological feature) were then averaged, and an enrichment score of cell type per anatomical structure (or histological feature) was computed.

Scoring of the fetal fallopian tube decreasing signature in adult data

To evaluate whether the rostrocaudal decreasing pattern of expression of the 15 genes identified in the fetal fallopian tube was preserved into adulthood, we generated 10x Visium spatial transcriptomics data from three areas of an adult fallopian tube (fimbriae, ampulla and isthmus) and scored the epithelial spots in each capture area for the average expression of this gene signature. In brief, we used the scanpy97 function sc.tl.score_genes() to compute the score and then performed the Jonckheere’s trend test98 with the alternative hypothesis ‘decreasing’ (2000 permutations) to quantify the significance of the trend. The same approach was used to test the rostrocaudal increasing pattern of expression.

Analysis of ISS data

Probe selection

To locate cell types and states as they appear and disappear during development of the reproductive tract (especially during early stages of development, which cannot be assayed with spot-based spatial transcriptomics), we designed a panel of 171 genes. The most unique marker genes per cell type were selected using term frequency–inverse document frequency, and the resulting panel was evaluated for completeness using geneBasis99. To evaluate completeness at the cell level, geneBasis checks for preservation of a cell’s neighbourhood in the kNN graph built with all the highly variable genes and the kNN graph constructed with the gene panel by comparing the distance between each cell and these two sets of nearest neighbours. geneBasis was applied to evaluate the same gene panel on the three views of the scRNA-seq data (≤10 PCW male and female embryos, >10 PCW male fetuses and >10 PCW female fetuses) separately. However, as thoracic HOX genes are not cited in the literature as relevant to the differentiation of the reproductive tract, these were not included in the original gene panel. We therefore decided to swap four of the genes in the original panels (DPP4, DPP6, CRISP3 and DPP10) for four HOX genes (HOXC4, HOXC6, HOXA7 and HOXC10) and ran ISS on some samples with this resulting panel instead (Supplementary Table 3).

Preprocessing

ISS data were preprocessed with a computational pipeline implemented by T.L. in Nextflow. In brief, the first step of the pipeline involved integrating the tiles along the z axis using maximal projection for each channel and then stitching them together along the remaining x and y spatial axes. Because the tissue moves slightly between sequencing rounds, image registration was required to correct for spatial misalignment of the fluorescent spots. This registration was achieved using nonlinear optical flow to align the small, Gaussian-like spots in the images. Once the images were stitched and corrected for misalignment, we used the DAPI signal to segment nuclei with CellPose100. For computational efficiency, images were first sliced into smaller 10,000 × 10,000 pixels patches. Even though there was no membrane protein staining, cell segmentation could be obtained by expanding the segmented nuclei by about 10 or 15 pixels to mimic the cytoplasm. Finally, spots appearing across registered coding channels were detected and their intensities were extracted. The intensity values for each spot over imaging cycles were then decoded on the basis of the collection of barcodes in the codebook provided by CARTANA with the PoSTcode algorithm101. To minimize false positives, PoSTcode inflates the codebook with an additional background barcode. The stacks of image values, representing the intensities across different channels and imaging cycles, were modelled using a matrix-variate Gaussian mixture model, assuming correlations between channels and imaging cycles. Decoded spots were ultimately assigned to segmented cells, which resulted in a gene-expression matrix used for downstream analyses.

Cell-type annotation of ISS data based on matched scRNA-seq data

Although the ISS gene panel was designed to maximize cell-type recovery, the throughput of the assay was still too limited to confidently assign cell-type identities solely on the basis of examining the measured gene expression. We therefore leveraged the full transcriptome resolution of the scRNA-seq dataset to increase the confidence in cell-type assignment using an approach based on kNN graphs implemented in the iss-patcher library102. Using this approach, we annotated the cells of ISS samples in a per-view fashion (for example, ISS samples for ≤10 PCW were annotated using the ≤10 PCW scRNA-seq reference). In early (≤10 PCW) samples, ISS cells for which the anatomical annotation was ‘gonad’ were excluded from the label-transfer algorithm because there are no gonad cells in the scRNA-seq reference data (by design).

Annotation of anatomical structures

The experimental workflow of ISS currently does not involve the acquisition of a H&E image of the sample. We overcame the limitation of not having a H&E image by generating a ‘virtual’ RGB image from the gene-expression profiles of three highly expressed markers (one per R, G and B channel). Major anatomical landmarks were therefore annotated on the virtual RGB image based on the morphology and the H&E image of the consecutive section. Histological landmarks were not annotated, as they necessitate the texture information captured by H&E staining only. By combining the information about the cell-type label of each ISS cell and its annotated anatomical location in the tissue architecture, we then computed an enrichment score (z score) of each cell type in each anatomical location (separately per view of the data). Such z score enrichment was computed using only the ISS cells with a cell-type label fraction above 0.8 (meaning that 80% of the scRNA-seq neighbours are annotated with the same cell-type label). The per-view enrichment scores were visualized using a heatmap.

Computational representation of the Müllerian and Wolffian rostrocaudal axes

Detailed information about the rationale and implementation for the computational representation of the Müllerian and Wolffian rostrocaudal axes can be found in Supplementary Notes 2, 3 and 4; here, we provide a summary.

To study the spatial gene-expression patterns along the developing female reproductive tract, we constructed the Müllerian rostrocaudal axis by measuring distances from key anatomical landmarks in spatially resolved transcriptomics data (10x Visium Cytassist and ISS), as described in the OrganAxis framework33. The axis spans from the fallopian fimbriae to the Müllerian vagina–vaginal plate junction, thereby capturing the differentiation of the Müllerian ducts. In cases when the full length of the uterovaginal canal could not be captured in a single section owing to the limitations of the 10x Visium Cytassist platform’s capture area, consecutive tissue sections were computationally stitched together103. This stitching involved manually overlapping consecutive sections using the image processing software Fiji (https://imagej.net/plugins/trakem2/) and applying affine transformations to align the sections, which were then concatenated into a single dataset. The resulting Müllerian rostrocaudal axis was normalized and rescaled, which enabled consistent comparison of gene-expression patterns across different samples of the female reproductive tract.

To extend our spatial analysis to single-cell resolution, we projected the Müllerian rostrocaudal axis onto scRNA-seq data by leveraging the single-cell resolution provided by ISS technology. We restricted our analysis to mesenchymal and epithelial compartments, which are key to understanding the development of the female reproductive tract. Using a modified kNN approach implemented in the iss-patcher102 library, each cell in the scRNA-seq dataset was assigned a position along the Müllerian rostrocaudal axis based on its proximity to cells in the ISS data. We then used the TradeSeq88 framework to model gene expression along the measured (10x Visium Cytassist) and imputed (scRNA-seq) Müllerian rostrocaudal axis, treating the axis analogously to pseudotime. Genes that showed significant changes in expression along the axis were identified using stringent criteria (P < 0.001, log[FC] > 0.5), and we prioritized those with specific expression in mesenchymal or epithelial cells.

In parallel, to investigate gene-expression patterns along the male reproductive tract, we constructed the Wolffian rostrocaudal axis, spanning the length of the epididymis and the initial segment of the vas deferens. This axis was derived using data exclusively from 10x Visium Cytassist, as we lacked sufficient ISS male samples for robust imputation of the axis onto scRNA-seq. The axis was similarly defined by calculating the normalized distance between the efferent ductules and the vas deferens. It was then rescaled to maintain consistency with the Müllerian rostrocaudal axis to facilitate comparative analyses. We used the TradeSeq88 framework to model gene expression continuously along the Wolffian rostrocaudal axis and used the same significance thresholds as for the Müllerian rostrocaudal axis to prioritize genes for which expression changes along the differentiating Wolffian ducts.

Prioritized spatially variable mesenchymal and epithelial genes in the imputed Müllerian rostrocaudal axis and measured Wolffian rostrocaudal axis were then used to filter the transcription factors and cell–cell communication events that probably drive Müllerian and Wolffian regionalization during fetal development.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

RELATED ARTICLES

Most Popular

Recent Comments