Experimental methods
Stem cell and organoid culture
We used the following induced pluripotent stem (iPS) cell line for all experiments (also see Supplementary Methods Table 1): histone2B–mEGFP that uniformly labels nuclei (cell line ID: AICS-0061-036, cl.036), mEGFP–β-actin that uniformly labels ACTB (cell line ID: AICS-0016-184 cl.184), mTagRFP–T-CAAX that labels cell membrane (cell line ID: AICS-0054-091, cl.091), mTagRFP–T-tubulin-α1b that labels TUBA1B (cell line ID: AICS-0031-035, cl.035), mTagRFP–T-laminB1 that labels LMNB1 (cell line ID: AICS-0034-062, cl.062) and unlabelled WTC iPS cells (cell line ID GM25256). NKX2.1–GFP/w human embryonic stem cells were obtained from A. Kirkeby’s research group at the University of Copenhagen, after the arrangement of an MTA with E. Stanley and A. G. Elefanty (Murdoch Childrens Research Institute). Stem cell lines were cultured in mTSR+ (mTeSR Plus, StemCell Technologies) with mTSR+ supplement (StemCell Technologies) and supplemented with penicillin–streptomycin (pen-strep; 1:200; 15140122, Gibco) on Matrigel-coated plates (354277, Corning). Cells were passaged 1–2 times per week using TryplE (12605010, Gibco) or EDTA in DPBS (final concentration of 0.5 mM; 12605010, Gibco). The cell culture medium was supplemented with 1:1,000 Rho-associated protein kinase inhibitor (ROCKi) Y-27632 (final concentration of 5 μM; 72302, StemCell Technologies) on the first day after passage. All cell lines were tested for mycoplasma infection regularly using PCR validation (Venor GeM Classic, Minerva Biolabs) and found to be negative. The organoid generation protocol (multi-mosaic and sparse organoids) was as follows: 500 cells (except Extended Data Fig. 1b, which has 3,000 cells) in mTSR+ (with 1:200 ROCKi and 1:200 pen–strep) were added per well of a 96-well plate (CLS7007, Corning) and centrifuged at 200g for 5 min to generate embryoid bodies. Fresh mTSR+ with 1:200 ROCKi and 1:200 pen–strep was exchanged on day 2. Fresh NIM with 2% dissolved Matrigel was supplied on day 4 and exchanged every other day, followed by differentiation medium without vitamin A (VitA) and with 2% Matrigel on day 10 and differentiation medium with VitA and with 1% Matrigel on day 15. Organoids were moved to 24-well plates on day 15, one organoid per well, and moved to a shaker, followed by moving one organoid per well to a 6-well plate at 1 month and kept on a shaker. The no-matrix organoids were cultured following the exact same conditions, without any addition of Matrigel at any point. For agarose embedding, organoids were embedded in 0.6% or 0.3% low-melting agarose (SeaPlaque agarose, 501010); stock was 1%, in PBS, diluted to 0.6% in NIM. The use of human embryonic stem cells for the generation of brain organoids was approved by the Ethics Committee of Northwest and Central Switzerland (2019-01016) and the Swiss Federal Office of Public Health. The composition of NIM, differentiation medium without VitA and differentiation medium with VitA was based on earlier work26.
Neural induction medium
To make 250 ml of NIM, the following were combined: 250 ml DMEM/F12, 2.5 ml N2 supplement, 2.5 ml Glutamax supplement, 2.5 ml MEM-NEAA, 50 μl heparin solution (5 mg ml−1) and 1.25–2.5 ml pen–strep, 0.22-μm filtered and stored at 4 °C for up to 2 weeks.
Differentiation medium without vitamin A
To make 250 ml of differentiation medium without VitA, the following were combined: 125 ml DMEM/F12, 125 ml neurobasal, 1.25 ml N2 supplement, 2.5 ml B27 without VitA supplement, 62.5 μl insulin, 227.3 μl 55 mM 2-mercaptoethanol solution, 2.5 ml Glutamax supplement, 1.25 ml MEM-NEAA and 2.5 ml pen–strep, 0.22-μm filtered and stored at 4 °C for up to 2 weeks.
Differentiation medium with vitamin A
To make 1,000 ml of differentiation medium with VitA, the following were combined: 500 ml DMEM/F12, 500 ml neurobasal, 5 ml N2 supplement, 10 ml B27 + VitA supplement, 250 μl insulin, 909.2 μl 55 mM 2-mercaptoethanol, 10 ml Glutamax supplement, 5 ml MEM-NEAA and 10 ml pen–strep, 0.22-μm filtered and stored at 4 °C for up to 2 weeks.
For the organoid time-course scRNA-seq (Figs. 1 and 2), brain organoids were generated from the histone2B–mEGFP cell line (cell line ID: AICS-0061-036). Organoids from days 5–11 belonged to the same batch, days 16–21 to a different batch and day 30 to a third batch. Multiple organoids of each line were pooled together to obtain a sufficient number of cells. For the early time points (days 5, 7 and 11), 24 organoids, each grown in an independent well of a 96-well plate, were pooled, decreasing to 12 organoids for day 16, 10 organoids on day 21 and 6 organoids on day 30. For the day 13 scRNA-seq (Extended Data Fig. 10) with Matrigel (11 organoids), no-matrix (18 organoids) and agarose embedding (11 organoids), the organoids were generated from the unlabelled WTC parent iPS cell line. Agarose was degraded using cell recovery solution (11543560, Corning) at 4 °C. For the experiments with YAP activator (Fig. 5e), 16 control organoids, 23 organoids from Py-60 (given on day 5) and 20 organoids from Py-60 (given on day 7) were hashed together and used for scRNA-seq. For the experiments in Extended Data Figs. 11 and 12), 5 control organoids (Matrigel), 4 no-matrix organoids and 13 organoids from Py-60 treatment were hashed and used for scRNA-seq. For the experiments with control and WLS-KO organoids (Fig. 5), organoids were generated from the control and KO cell lines (see below), 12 control and WLS-KO organoids each from Matrigel and no-matrix conditions, and 8 control and WLS-KO organoids each from Matrigel condition (DMSO control, CHIR99021 and Py-60) were hashed together and used for scRNA-seq.
For scRNA-seq (Fig. 5e,f), sparse and multi-mosaic organoids containing cells labelled with nuclear membrane (lamin, RFP), actin (GFP) and tubulin (RFP) and unlabelled cells were cultured as described above with the addition of 1:1,000 DMSO in NIM to control organoids on day 5 and 10 µM Py-60 (HY-141644, MedChem Express) to organoids on day 5 or day 7. Media with fresh inhibitor or control media were exchanged every other day until dissociation and sequencing on day 10. For scRNA-seq (Extended Data Figs. 11c–f and 12c–g), embryoid bodies were generated using the unlabelled WTC-11 parent iPS cell line from 500 cells aggregated in mTSR+ (1:200 ROCKi and 1:200 pen–strep), per well of a 96-well plate, and centrifuged at 200g for 5 min to generate embryoid bodies. Fresh mTSR+ with 1:200 ROCKi and 1:200 pen–strep was exchanged on day 2 and day 4. Fresh NIM was supplied on day 6 and exchanged every other day, followed by differentiation medium without VitA on day 10 and differentiation medium with VitA on day 15. Organoids were given 2% Matrigel, 10 µM Py-60 with 2% Matrigel or no-matrix on day 10. All media were exchanged on day 13 followed by dissociation and sequencing on day 16. For the scRNA-seq in Fig. 5h–k, organoids were generated from WLS-KO and control cell lines with small modifications: 500 cells in mTSR+ (with 1:200 ROCKi and 1:200 pen–strep) were added per well of a 96-well plate (CLS7007, Corning) and centrifuged at 200g for 3 min to generate embryoid bodies. Fresh mTSR+ with 1:200 ROCKi and 1:200 pen–strep was exchanged on day 2. Fresh NIM with 2% dissolved Matrigel was supplied on day 4 (Matrigel condition) and exchanged every other day. The organoids were given differentiation medium without VitA with 1% dissolved Matrigel from day 10 to day 18 with fresh media exchanged every other day. On day 10, the Matrigel organoids were divided into Matrigel, Matrigel (DMSO control), Matrigel (CHIR99021) and Matrigel (Py-60) conditions. The organoids were treated with 1:1,000 DMSO, 2 µM CHIR99021 (2-day pulse) or 10 µm Py-60 (4-day pulse) in differentiation medium without VitA. CHIR99021 was washed on day 12, with fresh media exchanged, Py-60 and DMSO were refreshed in media exchanged on day 12 for an additional 2 days and washed on day 14. Organoids were given differentiation medium with VitA with 1% Matrigel on day 18 and moved to 24-well plates, one organoid per well, and were not moved to a shaker until use for scRNA-seq on day 55. No-matrix organoids were treated the same without any Matrigel addition and without Chiron or PY-60 treatments.
WLS-KO and control line generation
The human iPS cell line WTC-TUBA1BmTAgRFP-T (cell line ID: AICS-0031-035, cl.035) was used to create the WLS-KO line. Two guides were designed using the scoring system from CHOPCHOP49 and the IDT Custom Alt-R CRISPR–Cas9 guide RNA generator tool. The following two guides were selected to target the WLS gene: ACTCAGCAAACGCGTCATCACGG and ACGAGCGGAACCACATCGCAGGG. The Alt-R CRISPR–Cas9 System (IDT) was used for guide delivery with electroporation, using the Lonza 4D-Nucleofector X Unit with the 20 µl 16-well strips according to the manufacturer’s protocol. To form the CRISPR RNA (crRNA)–trans-activating crRNA (tracrRNA) complex in a final concentration of 2.3 µM for each guide complex, 0.6 µl of each guide crRNA was combined with 0.6 µl of tracrRNA in a separate tube. The 1.2 µl of each crRNA–tracrRNA complex was then combined with 1 µl of Alt-R S.p. HiFi Cas9 Nuclease V3 (10 µg µl−1; 1081060, IDT) and 0.3 µl of Duplex Buffer (IDT) in separate tubes. For electroporation, the two loaded RNP complexes were combined with 1 µl electroporation enhancer and added to 20 µl P3 (V4XP-3032, Lonza) containing 2 × 105 cells. Cells were electroporated using the program H9. After electroporation, cells were incubated with 80 µl mTSR+ (100-0276, StemCell Technologies) with CloneR (final concentration of 1:1; 05888, StemCell Technologies) for 10 min in the 16-well strip and later split into 3 wells of a 12-well plate coated with Matrigel (35248, Corning) containing 0.5 ml mTSR+ (100-0276, StemCell Technologies) with CloneR (final concentration of 1:10; 05888, StemCell Technologies). After 24 h, the medium was replaced with 0.5 ml mTSR+ (100-0276, StemCell Technologies) supplemented with 1:200 pen–strep, and the cells were allowed to recover for 72 h. The cells were passaged onto CellAdhere Laminin-521 (77003, StemCell Technologies) coated well in StemFlex (A3349401, Gibco) with Y-27632 (final concentration of 10 µM; 72302, StemCell Technologies). Single clones were then generated using iotaSciences IsoCell according to the manufacturer’s protocol. Single-cell solution with 7,500 cells per millilitre was prepared for the IsoCell and two provided grids prepared. Singularity of clones was tracked using an EVOS XL Core over the course of 7 days. Single clones were harvested in StemFlex (A3349401, Gibco) with CloneR (final concentration of 1:10; 05888, StemCell Technologies) and distributed into a 96-well plate coated with Matrigel (35248, Corning). After 24 h, medium was exchanged to mTSR+ (100-0276, StemCell Technologies) supplemented with 1:200 pen–strep. After 48 h, clones were passaged to 12-well plates using 90% of the cell suspension, and 10% was used for validation of frameshift mutations by sequencing and analysed with CRISPResso2 (ref. 50). Clones were then cryopreserved. The control line and WLS-KO line used in this study were karyotyped and showed a normal karyotype.
Organoid dissociation and scRNA-seq
For all experiments, single-cell suspensions were generated by dissociation of the organoids with a papain-based neural dissociation kit (130-092-628, Miltenyi Biotec). In brief, organoids were washed three times with HBSS without Ca2+ and Mg2+ (37250, StemCell Technologies). Pre-warmed papain solution (1–2 ml) was added to the organoids and incubated for 15 min at 37 °C. The tissue pieces were triturated 5–10 times with 1,000 µl wide-bore and then P1,000 pipette tips. The tissue pieces were incubated twice for 10 min at 37 °C with additional trituration steps in between and after with P200 and P1,000 pipette tips. Cells were filtered consecutively with a 30-µm or 40-µm filter, centrifuged at 300g for 5 min and resuspended in cold PBS. The viability and cell count for the single-cell suspensions were assessed using a Trypan Blue assay on the automated cell counter Countess (Thermo Fisher Scientific). Cell suspensions from days 5, 7, 11, 16 and 21 were cryopreserved in Bambanker (BBH03, Nippon Genetics Europe) and stored at −20 °C until the scRNA-seq experiments were performed. The cryopreserved single-cell suspensions of each time point were thawed by warming up the cryo for 1–2 min in a water bath at 37 °C and directly centrifuged in 10 ml pre-warmed DMEM with 10% FBS. Cells were washed twice with PBS + 0.04% BSA and filtered through a 40-µm cell strainer (Flomi). For scRNA-seq, cells were resuspended to a final concentration after counting and viability checking that enabled targeting 8,000 cells and, in case the cell numbers were not sufficient, all cells were loaded. The scRNA-seq libraries were generated using the Chromium Single Cell 3′ V3 Library & Gel Bead Kit. Single-cell encapsulation and library preparation were performed according to the manufacturer’s protocol.
Cell hashing and scRNA-seq
For scRNA-seq (Fig. 5 and Extended Data Figs. 11 and 12), single-cell suspensions were obtained after organoid dissociation (described above) and cells of different samples were multiplexed using Cell Hashing51. Cell hashing was performed as described in the TotalSeq-A Antibodies and Cell Hashing with 10x Single Cell 3’ Reagent Kit v3.1 (Dual Index) protocol (https://www.biolegend.com/fr-ch/protocols/totalseq-a-dual-index-protocol). In brief, approximately 350,000 cells were resuspended in 45 μl DPBS + 0.5% BSA, 5 µl of Human TruStain FcX (Fc Receptor Blocking Solution, 422302, BioLegend) and cells were incubated for 10 min on ice. After blocking, 2 µl (1 µg) of each TotalSeq-A anti-human Hashtag antibodies (A0251-A065, BioLegend) were added per sample and were incubated for 30 min on ice with gentle agitation every 10 min. The cells were then washed twice with DPBS + 0.5% BSA and resuspended in 40 µl of DPBS + 0.5% BSA. Cells were counted and combined in equal ratios before processing them using Chromium Next GEM Single Cell 3′ Reagent Kits v3.1 (10x Genomics) according to the user guide (CG000206 Rev D, 10x Genomics). The library preparation of the HashTag Oligos was performed according to the TotalSeq-A Antibodies and Cell Hashing with 10x Single Cell 3’ Reagent Kit v3.1 (Dual Index) protocol. The libraries were sequenced according to the manufacturer’s guidelines on the Illumina NovaSeq platform.
Indirect iterative immunohistochemistry (4i)
The complete list of antibodies used for the 4i cycles is provided in Supplementary Methods Tables 2 and 11. For a detailed list of buffer compositions, see Supplementary Table 12. The 4i protocol was based on previous work34,52 and was as follows: organoids were fixed in 4% paraformaldehyde at 4 °C after harvesting, whereafter they were stored in 70% EtOH at −20 °C until use. Organoids were first embedded in a drop of 2% agarose due to their small size, then stored in 70% EtOH. The samples were then embedded in paraffin using a robot (TPC15, Medite) at 37 °C. For plate preparation, a Schott Nexterion 110 × 75 mm #1.5 glass plate was prepared in the following way: 10′ plasma asher for cleaning and surface activation, and immediately after, functionalization with poly-l-lysin (0.1 mg ml−1) with incubation for 1 h at room temperature. Next, it was rinsed twice with PBS, dried and stored dust free. Paraffin-embedded sections were cut at 3 µm, loaded onto the glass plate followed by drying at 40 °C overnight. They were then moved to 60 °C for 1 h and immediately placed in a Neoclear bath for de-paraffination. The plate was treated with 3′ NeoClear I, 3′ NeoClear II, 3′ 100% EtOH I, 3’ 100% EtOH II, 3′ 96% EtOH, 3′ 70% EtOH and 5′ ddH2O. This was followed by a 4% paraformaldehyde fixation (incubation for 15 min at room temperature) and rinsed with ddH2O. Aldehyde groups were blocked with 50 mM NH4Cl (incubation for 45 min at room temperature) and then the plate was rinsed with ddH2O. This was followed by heat-induced antigen retrieval with 10 mM citrate and 0.05% Tween-20 at pH 6.0. The plate was then heated up to 90 °C for 20 min with a histological microwave, then slowly cooled (in the microwave) to room temperature overnight. The plate was then fixed on a single-well frame with tape adhesive (Pentel tape ‘n glue) and sealed water tight with rubber cement. The samples were rinsed with 5% glycerol to prevent drying. The plate was then rinsed with ddH2O and PBS and stored with PBS at 4 °C until the start of the staining cycles. 4i cycles were then executed on a Felix Robot. Plate was rinsed three times with ddH2O and three times with 10 ml elution buffer for 10 min at room temperature at 300 rpm. It was washed three times with PBS. Blocking was done with 1.5% BSA, 0.1% Triton X100 and 0.1 M maleimide for 60 min at room temperature at 300 rpm. The plate was rinsed with PBS (to remove maleimide). Primary antibody hybridization was done in 1% BSA with 0.1% Triton X-100 for 45 min at room temperature at 300 rpm. The plate was washed three times with 1× PBS. Secondary antibody hybridization was done in 1% BSA and 0.1% Triton X-100 for 30 min at room temperature at 300 rpm. The plate was washed with PBS. The plate was rinsed two times with ddH2O followed by addition of imaging buffer.
Indirect iterative immunohistochemistry (4i) image acquisition
The 4i imaging was done on samples mounted on a large glass plate (110 × 75 mm) that was glued under a SBS-size superstructure. Imaging was performed on a Nikon Ti2 inverted microscope, coupled to a Crest X-Light V3, equipped with Teledyne Kinetix back-thinned cameras. The objective used was a Nikon Apochromat 40×/1.15 water immersion LWD (MRD77410) with water supply at 60 µl h−1 for 25 h. The following imaging conditions were used: DAPI at 80% power (100 ms), GFP at 80% power (300 ms), RFP at 80% power (300 ms) and Cy5 at 80% power (300 ms).
Light-sheet microscopy
All cell lines used for imaging were procured from the Coriell Institute as described above. For imaging, embryoid bodies were embedded in 20–50% Matrigel in neural-induction medium on day 4, one organoid per microwell and up to 16 organoids in one sample chamber. After gelification of Matrigel, NIM was added to the sample chamber and exchanged every other day. For the imaging experiments with ECM perturbations in Figs. 2 and 3, the sample mounting chamber has a vertical separation to segment it into 4 sub-chambers containing 4x organoids each. Embryoid bodies (4) were either embedded in 20–50% Matrigel dissolved in NIM, or 4 embryoid bodies were covered with 0.6% low melting agarose and for the no-matrix condition, 8 embryoid bodies were added to the sample chamber without any embedding. For the imaging shown in Extended Data Fig. 2d, embryoid bodies were aggregated from HES3 line (NKX2-1:GFP) using iPS Brew and rock inhibitor (1:200). After one day of aggregation, the embryoid bodies were transferred to a sample mounting chamber and embedded in Matrigel. A neural patterning medium (described in ref. 53) was employed for 14 days, complemented with SB432542 (Miltenyi, 130-106-543) and rh-Noggin (Miltenyi, 130-103-456) mediated dual SMAD inhibition from day 0 to day 9. After this, medium was changed to neural differentiation medium with VitA (composition previously described26) until day 21. The organoids were treated with the average morphogen concentration of SHH (140 ng/ml, Miltenyi, 130-095-727) from day 3–14 together with purmorphamine (0.21 µM, Miltenyi, 130-104-465). 3.5 µM XAV939 (Miltenyi, 130-106-539) was added from days 0–9. Medium was exchanged after every 2 days. Imaging was done with the LS1 Live light-sheet microscope developed by Viventis Microscopy, using a 25× objective demagnified to 18.5×, with a field of view that was approximately 710 µm and xy pixel size of 0.347 µm. Successive z steps were acquired every 2 µm for 201 steps. The frame rate for acquisition was 30 min for Fig. 1 and Extended Data Figs. 2e and 3a–c. The frame rate for acquisition was 60 min for Figs. 2, 3h–o and 5g and Extended Data Figs. 2f, 3, 4d-j, 5, 6e, 7 and 12h.
Fixation and whole-mount HCR
For whole-mount staining, organoids were fixed overnight at 4 °C on the nutator, washed 3–5 times in PBST, dehydrated with a PBST–methanol gradient (50% and 100%) and stored at −20 °C in 100% methanol until use. All probe sets were designed and provided by Molecular Instruments. The amplifiers and buffers were also ordered from Molecular Instruments (https://www.molecularinstruments.com/). HCR was performed according to the manufacturer’s protocol provided by Molecular Instruments with small changes. All five hairpins (B1–B5) conjugated with the following dyes were used per experiment (Alexa-488, Alexa-514, Alexa-545, Alexa-594 and Alexa-639). In brief, the samples were rehydrated with a series of graded methanol–PBST washes (25%, 50%, 75% and 100%) for 5 min each at 4 °C on the nutator and washed an additional time with PBST. The samples were then treated with 10 µg ml−1 proteinase K (25530-049, Invitrogen) for approximately 3–5 min at room temperature followed by two times 2× PBST washes for 5 min. They were then post-fixed with 4% paraformaldehyde for 20 min at room temperature and washed three times with PBST for 5 min each. The organoids were pre-hybridized in the probe hybridization buffer for 30 min at 37 °C. Of each probe set, 1 pmol was diluted into probe hybridization buffer and the samples were incubated overnight at 37 °C. The next day, the samples were washed four times with the probe wash buffer at 37 °C and washed two more times with 5× SSCT. The organoids were then incubated in the amplification buffer for 10 min at room temperature followed by adding snap-cooled hairpin mixture diluted in the amplification buffer to incubate overnight at 25 °C. The excess hairpins were washed the next day with 2 × 5 min washes as well as two longer washes of 30 min followed by 1 × 5 min wash with 5× SSCT buffer at room temperature. Organoids were stained with DAPI (1 µg µl−1) during the first 30 min washes. The samples were stored at 4 °C and mounted on a µ-Slide chamber (80807, Ibidi) and covered with 1% agarose. The samples were imaged using a ×10 water immersion and 0.8 NA objective on the Zeiss LSM980 Airyscan system. Images were acquired using lambda scanning followed by spectral unmixing to image 6 channels in three sets of excitation (first round 514 nm + 639 nm, 2nd round 488 nm + 545 nm + 594 nm, and 3rd round 405 nm). All images were processed using Fiji and the BigDataViewer plugin54,55.
Bulk CUT&Tag for YAP1
Organoids were generated from the WTC-11 iPS cell line by culturing embryoid bodies (as described above) for 6 days, following which NIM was added on day 6. On day 10, differentiation medium without VitA was given with 2% dissolved Matrigel. Single-cell suspensions of 12-day-old organoids were prepared using the Miltenyi Neural Tissue Dissociation Kit (P) (130-092-628) following the manufacturer’s guidelines. Cells were counted and directly transferred into CUT&Tag Wash buffer supplemented with 0.01% digitonin (20 mM HEPES pH 7.5, 150 mM NaCl, 0.5 mM spermidine and 1× Roche protease inhibitor cocktail). Per experiment, 1 million cells were used and incubated with 2 µg YAP antibodies (ab52771, Abcam and sc-101199, Santa Cruz). All following steps were performed as previously described56,57, except that the Tn5 incubation and cutting were performed at a NaCl concentration of 150 mM. To control unspecific cutting, we performed the experiment without antibody (Tn5 only) and with a generic anti-rabbit antibody. The proteinA-Tn5 was purified in house as previously described56. Final libraries were sequenced on the NovaSeq platform with PE 2 × 50-bp read length. Sequenced reads were mapped against hg38 using Bowtie2 (ref. 58). These were filtered for PCR duplicates and mapping quality. Coverage tracks were then generated using deeptools2 bamCoverage and normalized by sequencing depth59. Tracks were visualized using IGV.
RT–qPCR for YAP1 perturbation screen
Organoids were generated from the WTC-11 iPS cell line as described above and cultured without Matrigel. Organoids were treated with 1:1,000 DMSO (control), YAP1 activators (10 μM TRULI, 10 μM GA-017 and 10 μM Py-60) or inhibitor (10 μM TED-34) in differentiation medium without VitA on day 10 and refreshed on day 13. Eight organoids per condition were harvested for qPCR in TRIzol and used for RNA extraction. Of each sample, 50 ng RNA was reversely transcribed into cDNA using the ReadyScript cDNA Synthesis Mix (RDRT, Sigma-Aldrich), with the following cycler programme: 5 min at 25 °C, 30 min at 42 °C, 5 min at 85 °C and hold at 4 °C in a C1000 Thermal Cycler (Bio-Rad). RT–qPCR was then performed for WLS gene expression quantification and GAPDH (used as an internal normalization control). The WLS primers were forward-gccagctatgagcaaagtcc and reverse-tgggatggtgcatacaagaa. The GAPDH primers were forward-GGAGCCAAACGGGTCATCATCTC and reverse-GAGGGGCCATCCACAGTCTTCT.
RT–qPCR was performed using KAPA SYBR FAST qPCR Kit (KK4602, Kapa Biosystems) according to the manufacturer’s instructions with the following cycler programme: 3 min at 95 °C; 45 cycles of 3 s at 95 °C, 20 s at 60 °C and 10 s at 72 °C; 10 s at 95 °C; 1 min at 65 °C; and 1 s at 97 °C in a LightCycler96 (Roche).
Synthetic PEG hydrogel
Eight-armed poly(ethylene glycol) (PEG) with vinyl sulfone (VS; 8-PEG-VS) end group (hexaglycerol octa(vinylsulfonylethyl) polyoxyethylene) was purchased from NOF (40 kDa). Eight-armed PEG with thiol functionality containing sortase-sensitive peptide sequence (8-PEG-SS-SH) was synthesized following a previous protocol60. The effective functionality was checked by 1H-NMR and Ellman assay for thiol concentration. The thiol-containing peptides used in this study, GFOGER (GGYGGGPG(GPP)5GFOGER(GPP)5GPC), scrambled control peptide RDG (GRCGRDGSPG) and dithiol SrtA sensitive (DSH-SS; GCRELPRTGERCG) were purchased from Biomatik. The hydrogel (1.75% w/v PEG content to reach elastic modulus of approximately 100 Pa) was formed by reacting 8-PEG-VS with 8-PEG-SS-SH and adhesion peptide (1 mM). For some conditions, laminin-111 (Trevigen) was incorporated into the final gel mixture at 0.6 mg ml−1. The organoids were embedded in the hydrogel mixture on a light-sheet membrane and left to polymerize for 30 min at 37 °C. After gelation, the culture media were then added to the hydrogel.
Sortase A production and gel degradation
Sortase A (SrtA; plasmid #75144, Addgene) with a 6×-His tag was expressed in E. coli BL21(DE3) following a published protocol61. The sortase degradation solution was prepared following a previous protocol60. In brief, a mixture of SrtA (15 µM) and triglycine (GGG, 200 mM; Sigma) was added to the NIM (day 8) and differentiation medium without VitA (day 13) samples. The degradation solution was added to the PEG hydrogels and left to degrade for 30–60 min at 37 °C until complete dissolution. The organoids were collected for further analysis.
Bulk RNA-seq analysis
Organoids frozen in 50 μl TRIzol Reagent (15596018, Thermo Fisher Scientific) were rapidly shaken until the tissue was completely dissolved. The volume of TRIzol was adjusted to 500 μl and total RNA was extracted with 100 μl of chloroform by rapid shaking for 15 s and subsequently centrifugation at 12,000g at 4 °C for 15 min. The resulting aqueous phase was re-extracted with an equal chloroform volume. Following centrifugation, the aqueous phase was mixed with an equal volume of isopropanol and RNA was precipitated along with the GlycoBlue coprecipitant (AM9516, Thermo Fisher Scientific) for an average of 4 days at −20 °C. Subsequently, the samples were centrifuged at 12,000g at 4 °C for 30 min. The resulting RNA pellets were washed two times with ice-cold 80% ethanol, then dried on ice for 10 min and resuspended in 20 μl of the TURBO DNA-free (AM1907, Thermo Fisher Scientific) mix. Following DNA digestion for 30 min at 37 °C and DNase inactivation according to the manufacturer’s guidelines, the dissolved RNA was centrifuged at 10,000g for 1.5 min at 4 °C. RNA aliquots were immediately frozen on dry ice and subsequently stored at −80 °C. RNA traces and concentrations were examined using the Qubit RNA BR Assay kit (Q10211, Thermo Fisher Scientific) and RNA 6000 Pico Kit (5067-1513, Agilent Technologies). RNA dilutions for normalization purposes were prepared using RNase-free water in a 96-well plate. The cDNA was prepared and amplified according to the Smart-seq2 protocol62. An average of 10–15 ng of RNA (1 µl) was used as input into the reaction. The cDNA was purified using the SPRIselect reagent (B23318, Beckman Coulter) at the ratio of 0.8×, and the resulting cDNA traces and concentrations were examined using the Bioanalyzer DNA High Sensitivity kit (5067-4626, Agilent Technologies) and Qubit dsDNA High Sensitivity kit (Q32854, Thermo Fisher Scientific). Of normalized cDNA, 1.25 μl was used to construct Nextera libraries. The single organoid bulk RNA-seq libraries were subsequently pooled (3 µl each) and purified two times using the SPRIselect reagent (B23318, Beckman Coulter) at 0.9× ratio. The libraries were sequenced according to the manufacturer’s guidelines on the Illumina NovaSeq platform.
Data analysis methods
Preprocessing of scRNA-seq data from the organoid time course
We used Cell Ranger (v3.0.2) with the default parameters to obtain transcript count matrices by aligning the sequencing reads to the human genome and transcriptome (hg38, provided by 10x Genomics, v3.0.0). Count matrices were further preprocessed using the Seurat R package (v4.3.0) and R (version 4.4.0)63. First, cells were filtered on the basis of the number of detected genes and the fraction of mitochondrial genes. As sequencing depth varied between time points, the threshold of the number of detected genes was set individually for each sample. For the scRNA-seq time course, the number of detected genes was between 1,000 and 7,500 and the mitochondrial genes threshold was less than 10%. Three thousand variable features were used and the number of PCA was set to 50. For Fig. 1b–d, the total number of cells per day after preprocessing were day 5 = 5,481, day 7 = 8,183, day 7 = 4,912, day 16 = 6,571, day 21 = 7,962 and day 30 = 7,950.
Integration of different time points
Integration of time points was performed using the log-normalized gene expression data of meta cells. We used the union of the selected genes and transcription factors and further excluded cell-cycle-related genes from the set. Next, we computed cell-cycle scores using the Seurat function CellCycleScoring(). Subsequently, the data were z-scaled, cell-cycle scores were regressed out (ScaleData()) and principal component analysis (PCA) was performed using the Seurat function RunPCA(). We used the first ten principal components (PCs) to integrate the different time points in the dataset using the cluster similarity spectrum (CSS) method. To remove any remaining cell-cycle signal for any downstream tasks, we again regressed out the cell-cycle scores from the integrated CSS matrix. To obtain a 2D representation of the data, we performed UMAP64 embedding using RunUMAP() with spread = 1.0, min.dist = 0.4 and otherwise the default parameters.
Pseudotime analysis of the early progenitor transition trajectory
Cells from day 5, day 7 and day 11, which were annotated as neuroectodermal (both neuroectoderm and late neuroecotoderm) were subsetted. CSS was used to re-integrate cells from different time points, and PCA was performed on the CSS representation to further reduce embedding dimensions to 10. A linear model was fit for each PCA-reduced CSS dimension, each with the cell-cycle scores as independent variables. Residuals of the linear models were obtained as the cell-cycle-regressed-out PCA-reduced CSS representation, which was then used as the input of the diffusion map (implemented in the destiny R package). The first diffusion component (DC1) was ranked across cells in the subset, and the ranked DC1 was used as the pseudotime representing the early progenitor transition progression.
Identification of genes with expression changed during early progenitor transition
For each gene detected in more than 1% of cells during early progenitor transition, two linear models were fit. The full model includes the natural spline (degree = 5) of pseudotimes, whereas the reduced model only contains intercept. Two tests were performed to compare the two models. The relaxed test used F-test to compare regression mean squares of the two models. The stringent test used F-test to compare mean square errors of the two models. To correct for multiple testing, the Bonferroni method was applied to the relaxed tests, and the Benjamini–Hochberg method was applied to the stringent tests. Genes with adjusted P < 0.05 in relaxed tests were defined as the broad set of genes with expression changed during the transition, whereas those with adjusted P < 0.01 in stringent tests were defined as the stringent set.
Preprocessing of scRNA-seq data from all other datasets
We used Cell Ranger (v3.0.2) with the default parameters to obtain transcript count matrices by aligning the sequencing reads to the human genome and transcriptome (hg38, provided by 10x Genomics, v3.0.0). Count matrices were further preprocessed using the Seurat R package (v4.3.0). The cells were filtered on the basis of the number of detected genes and the fraction of mitochondrial genes. As sequencing depth varied between time points, the threshold of the number of detected genes was set individually for each sample. For the matrix perturbation dataset (Extended Data Fig. 10), the number of detected genes was between 500 and 8,000, 3,000 variable features were used and the number of PCA was set to 50. Mitochondrial and histone-related genes were removed followed by integration. The total number of cells per condition after pre-processing were Matrigel = 8,095, no-matrix = 8,343 and agarose = 3,725. To obtain a 2D representation of the data, we performed UMAP embedding using RunUMAP() with the default parameters. For the YAP perturbation (Fig. 5), the number of detected genes was more than 200 and the mitochondrial genes threshold was set to less than 20%. Three thousand variable features were used and the number of PCA was set to 50. Mitochondrial, ribosomal and histone-related genes were removed followed by integration. The total number of cells per condition after preprocessing was control = 2,085, Py-60 given on day 5 = 763 and Py-60 given on day 7 = 1,955. To obtain a 2D representation of the data, we performed UMAP embedding using RunUMAP() with spread = 0.3 and min.dist = 0.5. For the WLS-KO scRNA-seq dataset in Fig. 5h–k, the number of detected genes was more than 500 and percent.mt of less than 10. Mitochondrial and ribosomal genes were removed followed by integration. Three thousand variable features were used and the number of PCA was set to 20. The total number of cells detected per condition after demultiplexing and preprocessing were Matrigel (wild type (WT)) = 3,560, Matrigel (WLS KO) = 1,661, no-matrix (WT) = 1,283, no-matrix (WLS KO) = 4,094, Chiron (WLS KO) = 1,774, Py-60 (WLS KO) = 1,383, Chiron (WT) = 3,310, Py-60 (WT) = 1,399, DMSO control (WT) = 2,087 and DMSO control (WLS KO) = 1,453. To obtain a 2D of the data, we performed UMAP embedding using RunUMAP() with spread = 0.6 and min.dist = 0.4. For the scRNA-seq datasets in Extended Data Figs. 11 and 12, the number of detected genes was more than 1,000 and percent.mt of less than 10. Mitochondrial, ribosomal and histone-related genes were removed followed by integration. Three thousand variable features were used and the number of PCA was set to 50. The total number of cells per condition after preprocessing were Matrigel = 1,183, no-matrix = 1,285 and YAP activator Py-60 = 1,189). To obtain a 2D representation of the data, we performed UMAP embedding using RunUMAP() with the default parameters. For all datasets, transcript counts were normalized to the total number of counts for that cell, multiplied by a scaling factor of 10,000 and subsequently natural-log transformed (NormalizeData()). The datasets in different experiments were integrated using CSS65.
Bulk RNA-seq analysis
Bulk RNA-seq reads were mapped to GRCh38 human genome using STAR (v2.7.11b). For each sample, the transcripts per million (TPM) were calculated for genes annotated in Ensembl v93 (filtered using the build steps in Cell Ranger Human reference 3.0.0, GRCh38) using RSEM (v1.2.28, rsem-calculate-expression). For each highly variable gene, a linear model was fit, with log-transformed TPM in Matrigel and no-ECM samples as the response variable, sample age and ECM condition as the independent variables. The coefficient of the ECM condition was effectively the approximation of overall log-transformed fold change (logFC) of gene expression in Matrigel in relative to no ECM. Next, the similar coefficients were calculated for the same genes, but for samples with the PEG-RDG condition and one of the PEG-GFO, PEG-LAM and PEG-LAM-GFO condition, as the estimation of transcriptomic effect by the addition to PEG. Spearman correlation coefficient was then calculated between Matrigel-to-no-ECM logFC and each of PEG-GFO, PEG-LAM and PEG-LAM-GFO with PEG-RDG, across the highly variable genes. The correlation quantifies how well the transcriptomic effect by Matrigel was recapitulated by the addition of PEG.
For each expressed gene, three linear models were fit, all of which use the log-transformed TPM as the response variable and sample age plus ECM condition as the independent variables. All the three models were fit using samples with different PEG media, but using samples at both day 8 and day 13, only day 8 and only day 13. ANOVA analysis of variance) was applied to each of the model for the F-test-based P value of ECM conditions. Benjamini–Hochberg multiple test correction was applied to tests using the same sample set. Genes with adjusted P < 0.05 in at least one of the three tests were considered as differentially expressed genes (DEGs) in PEG conditions. For each of the DEGs, logFC between each PEG conditions with PEG-RDG, as well as between Matrigel and no-ECM samples at each time point was calculated.
For each gene detected in more than half of the samples or average TPM across all samples larger than one (defined as expressed genes), the mean and coefficient of variation of expression were calculated. A generalized linear model with gamma distribution was fit between coefficient of variation and 1/mean for highly expressed genes among those with coefficient of variation > 0.3 (>median). Given the model, a Chi-square test was applied to each gene. Benjamini–Hochberg multiple test correction was applied to all the tested genes. Genes with the adjusted P < 0.05 were considered as highly variable genes.
Light-sheet image analysis
Denoising and background subtraction
To increase the comparability between images in the dataset, all movies were denoised using Noise2void66 (v0.3.1). A separate 2D denoiser was trained for every position and every channel. The denoiser was trained on randomly selected z slices from throughout each corresponding movie and channel. For the datasets used in Figs. 1e–l and 3b,c,f,g and Extended Data Fig. 2e, 30 randomly selected z slices were used, and for the dataset in Figs. 2g–j and 3d,h–o and Extended Data Fig. 4d–j, 90 random z slices were used. The randomly selected z slices were split into a training subset with 90% of the z slices and a test subset with 10% of the z slices. The denoiser was trained for 128 epochs on the training subset with 96 × 96 pixel patches, with a batch size of 256, a unet_kern_size of 3, train_loss equal to mse and n2v_perc_pix of 0.198. The training progress was monitored on the test subset. The trained denoiser was then used to denoise all images in the movies. After this, the background subtraction was performed by using the minimal value along the z axis and clipping small intensity values. The Extended Data Fig. 2e dataset was further dehazed using the dehazing function from dexp67 with a filter size of 60 and correct_max_level set to false. For Fig. 1g, the 3D images were bi-linearnly rescaled to an isotropic voxel size of 1.15 μm. Then, a wedge mask was created with a triangular cutout and then multiplied with the downscaled image. The image was rotated using Scipy68 (v1.7.3) ndi.rotate function. The images were then attenuated as implemented in the dexp.processing.color.projection function with the parameters attenuation of 0.0005, attenuation_min_density of 0.3 and attenuation_filtering of 24. For the spinning movie, in Supplementary Video 2, the 3D image was bi-linearnly rescaled, with anti-aliasing, to an isotropic voxel size of 1.388 μm and 32 zero-padding planes were added to before and after in x, y and z directions. The image was then rotated with the scipy.ndi rotate function, and after each rotation, each channel was attenuated with the parameters attenuation_filtering = 4, attenuation_min_density = 0.002 and attenuation = 0.01.
Temporal registration
The light-sheet dataset from Fig. 1e–l and Fig. 3b,c,f,g was registered as follows. First, the movies were cropped and centred using the cropping algorithm implemented in LStree (v0.1)27. As only one organoid is present per position, the largest object was linked throughout the movie for cropping. After cropping, the images were padded with zeros, 32 planes before and after the image in the z direction, and with 96 planes before and after the image in both x and y direction before registration. For the temporal image registration, translational registration from itk-elastixs (v5.2.1)69,70 with the default translation parameter map was used to register the movies using GFP as the leading channel. The first frame in the movie was used as a reference frame and then the second image was registered to it. After this, the translation was applied to both channels of the second image. Then, the next image was registered to the previously registered frame in the movie, and again the translation was applied to both channels in the next image. This was repeated until all images in a movie were registered. After translation, the images were clipped to ensure that no small negative values remained. Movies in the positions 9–16 contained some higher-intensity areas. To deal with this, before calculating the registration, the image intensities were rescaled to the 1st and 99th percentile, and the default translation parameter map with three resolutions was used for registration. The calculated translation was then applied to both non-rescaled images for these positions.
Before quantifications and tracking, the light-sheet movies used in Figs. 2 and 3d were linearly downscaled to half resolution and a multi-frame comparison registration approach was used. The mean phase cross-correlation of the maximum intensity projection of the GFP channel across each axis was used to calculate the translation in x, y and z between any pair of images. This approach was used to calculate the translations across three consecutive time points for every frame in the movie. On the basis of this multiframe translation matrix, the averaged translation for each time point was calculated. The translations were then rounded to the next integer, and zero padding was calculated for the whole movie. After this, the translations were applied to all channels and masks by shifting the image along each axis and adding padding to prevent data loss.
Tissue segmentation
A pixel-wise random forest classifier was used for the tissue segmentation similar to other already established tools such as Weka segmentation71. Owing to the multi-terabyte size of the dataset, a Python pixel-wise segmentation classifier using both scikit-learn (v0.18.3)72 and scikit-image (v1.1.1)73 was implemented (https://scikit-image.org/docs/stable/auto_examples/segmentation/plot_trainable_segmentation.html#id4). The input for the random forest classifier were 2D image features from the sum of both channels, which were extracted using multiscale basic features function from scikit-image with a minimum sigma of 1, a maximum sigma of 64, and intensity, edge and texture features. The input image was bi-linearly downscaled to one-quarter resolution in xy. To train the classifier, random z slices from throughout the movies were annotated with bounding boxes in jupyter notebook using bboxwidgets (v0.4.0), as either background, lumen or organoid tissue (https://github.com/gereleth/jupyter-bbox-widget). Random slices were iteratively annotated until satisfactory image segmentations were produced. In total, 1,414 slices were corrected or annotated for the dataset used in Fig. 1j–l and 2,221 slices for the dataset used in Figs. 2h–j and 3n,o. For every slice with a bounding box, the 2D image features were extracted and a random forest classifier was trained to predict the labels. One classifier was trained for each dataset. The random forest classifiers for the two datasets were trained to a depth of 25, with 50 random trees and max_samples of 0.05. The trained models were then used to predict the labels for each pixel for each z slice in each time point of the movies. For both datasets, the segmentation masks were predicted for one time point per hour. After predicting the labels for each 2D segmentation, the output masks were stacked to a 3D segmentation mask. After this, the organoid and the lumen masks were extracted and smoothed by applying a scikit-images Gaussian filter with a voxel size of 1.388 in the z direction and 2 in both the x and y directions to the binary mask and then rebinarizing and combining the masks back to one segmentation image. After this step, each z slice of the 3D segmentation masks were further processed in 2D. First, small lumens were removed using the binary_closing function from scikit-image with a disk size of 3. Any holes in the lumen masks were filled and any background pixels within the organoid were assigned the lumen label. Furthermore, any voxels labelled lumen outside of the convex hull of the organoid were assigned the background label. The slice-wise 2D segmentations were then reassembled into a 3D tissue segmentation. As only one organoid is present in one movie, label and regionprops functions from scikit-image were used to extract the largest non-background object for further downstream analysis. Before tracking and morphology quantifications in Fig. 2h–j and Extended Data Fig. 4e–j, the lumen masks were upscaled to a 0.694 μm pixel size in xy, and two additional preprocessing steps were introduced to ensure temporal stability and reduce the amount of false-positive fusion detections. A temporal median filter was applied across all lumen masks of one movie with a filter size of 5 across the temporal axis and a size of 1 across the image directions (xyz). To reduce the number of false-positive fusions, a binary opening with four iterations was applied to the lumen masks. Finally, connected components were used to label the lumen and any lumen smaller than 20,000 μm3 was removed.
The resulting masks were visualized in 3D for Figs. 1i and 2h, Extended Data Fig. 4b,d,i,j, Supplementary Videos 4 and 8. The masks were rescaled to an isotropic resolution of 1.388 (Fig. 1i and Extended Data Fig. 4j) or 2.776 μm (Fig. 2h), extracting each lumen surface using scikit-images marching cubes, with a step size of 1 and level 0 and the organoid surface using marching cubes with a step size of 2 and level 0. Next, the surface meshes were cleaned using pymeshfix (v0.16.1; https://github.com/pyvista/pymeshfix/tree/main) clean_from_arrays and then meshes were smoothed using the filter_taubin function from trimesh (v3.13.0) (https://github.com/mikedh/trimesh) with 50 iterations. The surface meshes were then plotted using matplotlib (v3.5.2) plot_trisurf with the bounding box being equal to the maximum length in any of the xy and z direction. For Extended Data Fig. 4a and Supplementary Videos 3 and 7, using the organoid masks, the 90th percentile largest organoid z slice was calculated for every time point in every movie. Then, a sliding average was applied over the calculated 90th percentile z slices with a window size of 20 for Extended Data Fig. 4a and Supplementary Video 3 and a window size of 5 for Supplementary Video 7. Finally, for each of these slices, the images were false coloured according to the masks.
Tissue quantification
To characterize the tissue architecture, the organoid tissue and lumen sizes were quantified for the first 124 h for both Figs. 1j–l and 2h–j and Extended Data Fig. 4. For lumen quantification, any lumen with a size smaller than 20,000 was removed using scikit-images remove_small_objects function. Furthermore, scikit-images label and regionprops functions were used to assess the number of lumen, the major axis length and the size of each lumen. For Fig. 1j–l and Extended Data Fig. 4c, for the organoid at position 2, the measurements were not considered after hour 49, as the camera moved to another position. The organoids at position 4 between hours 43 and 48 were not considered as the organoids move out of the field of view. For Extended Data Fig. 3, the light-sheet movies were segmented using Ilastik and post-processed, as above, and scikit-images regionprops functions were used to assess the volumes and the major axis length.
Single-cell segmentation
To quantify changes in cell morphologies, 3D cell segmentation was performed using EmbedSeg32. One 3D image for every 24 h was created and was segmented for each movie. For the dataset in Fig. 3b,c,f,g, one position was segmented; for Fig. 3h–o, three positions per condition were segmented. The agarose condition was further supplemented with one additional red channel movie. Before training the model, the images were bi-linearly downscaled by a factor of 2 in the xy direction. Hand-selected 3D volumes were annotated using labkit74 and then used to train the model in a human-in-the-loop manner, where iteratively predicted 3D volumes were corrected and then used to retrain the model. In total, 5,850 cells were hand corrected in this manner. For creating the crops for training the model, the medoid was used and a n-sigma of 4 was used to calculate the crop size. Furthermore, a speed up factor of three was used to create the crops. The model tile size was set to 600 for both the x and y directions and 80 in the z direction. The model was trained on 5,512 cells, and 338 cells were set aside as a validation set. The 3D model was trained for 200 epochs, with a batch size of 8, and then the model that had the best iou performance of 0.72 on the validation set was selected. Using tiling, the 3D volumes were segmented, with a fg_thresh of 0.4, seed_thresh of 0.7 and test-time-augmentation set to true.
Cell morphology quantification
Morphometrics75 (https://github.com/morphometrics/morphometrics) was used to extract cell morphology measurements from the segmented images. The images were rescaled bi-linearly, and the masks with a nearest neighbour interpolation set to an isotropic voxel size of 0.694 µm. To speed up the measurements, masks with a volume smaller than 100 voxels (33.4 µm3) were removed using remove_small_objects from scikit-image. The surface, size, intensity, position and moment measurements were extracted for each cell mask using morphometrics. In addition, the major and the minor axis lengths for each cell were measured using the scikit-images regionprops function.
Quality control and demultiplexing
Before the manual quality control steps, masks with a volume of lower than 100 and a maximum intensity of less than 20 were removed from further analysis, leading to a total of 299,027 subcellular structures. Furthermore, any measurements containing NaNs (‘not a number’ values) were also removed from the analysis. Scanpy76 and morphometrics were used to normalize, dimensionally reduce using PCA and cluster (Leiden) the dataset. Then, random cells were extracted, from different time points, positions and clusters, and labelled to either be a histone, lamin, actin, tubulin, a cell membrane or a faulty mask. In total, 3,379 cells were hand annotated in this manner. The annotated cells were then split into a train (90%) and a test set (10%). For demultiplexing all measured measurements from morphometrics, channel information, as well as the major and minor axes were used. The training data were used to perform a fourfold cross-validation grid search on random forest classifiers from scikit-learn on the number estimators and the maximum depth of the tree. The best-performing random forest classifier was selected for demultiplexing and quality control. After this, wrongly demultiplexed structures, which either did not exist in the experiment or originated from the wrong channel, were removed. After manual quality control, a total of 59,053 structures remained. The model performance was then evaluated and achieved an accuracy of 0.73 on the test set. Furthermore, stratified fivefold cross-validation, across the whole dataset, was used to estimate the cross marker mispredictions. To do this, the model was retrained on the training set on each fold, the confusion matrix was calculated for each test set of a fold and then summed over all folds. Any low-quality mask predictions or labels were then removed, and the remaining protein markers were used to calculate the marker specific precision.
Morphology analysis
All morphological analyses were run using non-intensity-dependent measurements and additionally the axis length ratio was added. Scanpy76 was used to morphologically analyse the segment markers. For Fig. 3h–o, only actin cells from the ECM perturbation experiment were used and the measurements were scaled and dimensionally reduced using the pca function from morphometrics. The neighbourhood graph was then calculated using the first five principal components and then clustered using Leiden clustering with a resolution of 0.6 using the cluster features function from morphometrics. After this, clusters 8 and 9 contained only debris and were removed from the further analysis, and the data were rescaled, dimensionally reduced and clustered with the same parameters. After clustering, a graph embedding was calculated using PAGA33 with a threshold of 0.1, and the daily proportions of the different clusters for the conditions were calculated. For Fig. 3g,h, to obtain a 2D representation of the data, a PAGA-initialized UMAP embedding was calculated. On the basis of the axis length ratio of the cells in the clusters, the clusters were either coloured as containing elongated (cyan) or non-elongated cells (yellow). To create Fig. 3k, the first non-zero element from the 3D actin masks used in Fig. 3h–j was extracted. This was then used to extract a 2D image with the corresponding voxels from the intensity image and to colour the pixels according to the cluster membership. To estimate the diversity of the cell morphologies throughout the time series, the Shannon index was calculated for every time point and every condition: H′ = –∑i pi × log2(pi), pi = ni/N, where N is the total number of cells of all clusters per day and condition and ni is the number of members of a cluster for a specific condition and day. The confidence intervals were estimated through a bootstrap analysis, by randomly sampling with replacement from the whole dataset and recalculating the Shannon index for each condition and day. By sampling for 1,000 times, the 95% confidence interval was estimated. For Fig. 3b,c,f,g, all demultiplexed markers were used and the neighbourhood graph was then calculated using the first 4 principal components and then clustered using Leiden clustering with a resolution of 1.5. After clustering, a graph embedding was calculated using PAGA33 with a threshold of 0.1 and a PAGA-initialized UMAP embedding was calculated. For the PAGA path heatmaps, the diffusion pseudotime starting at cluster 2 was calculated and then plotted with an n-average of 50. For Extended Data Figs. 6e and 7b, the neighbourhood graphs were then calculated using the first 4 principal components and then clustered using Leiden clustering with a resolution of 0.4. After this, the PAGA graphs and the PAGA-initialized UMAPs were calculated. The PAGA cluster analysis was repeated using a Leiden clustering resolution of 16 and PAGA plots and heatmaps were created for the Extended Data Fig. 7e,f with an n-average of 20.
Cell angle analysis to the surface
To calculate the cell angle to the surface, first an ellipsoid was used to fit to the surface of the 3D masks of the individual cells (https://github.com/aleksandrbazhin/ellipsoid_fit_python). The surfaces of the binary subcellular feature masks were extracted using the marching_cubes function from scikit-image with a step size of 2 and level 0. Then, to estimate the directionality of the cell masks, the major axis of the 3D ellipsoid was extracted. Any cells where the ellipsoid had negative radii or radii larger than 180 voxels were excluded from further analysis. The organoid mask was upscaled to the same dimensions as the cell segmentation masks and used to extract the nearest surface normal for each cell. To obtain the surface normal, a surface mesh was extracted using scikit-images marching cubes, with a step size of 40 and level 0. For all surface meshes, the pymeshfix clean_from_arrays function was used to clean the surface mesh. For the organoid surface mesh, the filter_taubin function from trimesh with 50 iterations was used to smooth the surface. spatial.cKDTree from scipy (v.1.8.1) was used to obtain the nearest surface normal from the centroid of each cell. The absolute cosine similarity from scikit-image between the nearest surface normal and the major axis of each ellipsoid was calculated. We defined the alignment index as the absolute cosine similarity between the major axis of the ellipsoid and the nearest surface normal of the organoid. For Fig. 3n,o, only actin was considered, whereas for Extended Data Fig. 7g, all demultiplexed markers were used. All plots for Fig. 3n and Extended Data Fig. 7g were created using the ellipsoid_plot (https://github.com/aleksandrbazhin/ellipsoid_fit_python).
Tracking
All tracking was done using motile (https://github.com/funkelab/motile). First, detections were created for each to be tracked mask labels to create unique identifiers for each label. After this, a candidate graph was built by adding nodes to a networkX graph and noting the centroid location of each mask, accounting for anisotropy as well as adding a node feature for a minimum volume cut-off. The node feature was calculated: 1 – (minvol/volume). After the candidate graph was built, edges between the cells of two successive cells were added to the graph. Only edges, where the distance between the centroids of the objects were closer than a maximum distance cut-off, were added. Any edges with a zero weight were removed. After this, the edges were added to the networkX graph and the graph was converted to a motile candidate graph. Motile was then used to initiate a solver. The node and the edge selection weight were then set to −1 and added as costs. Furthermore, appear and disappear costs, as well as maximum number of parents and children constraints were added. Using this, an optimal tracking solution was then solved for and the candidate graph converted to the track graph by removing any edge or node with a solution value of less than 0.5.
Lumen tracking
The approach as described above was used to track all 16 movies used in Fig. 2. The lumen were tracked at one-quarter resolution, with a maximum distance of 1,110 μm. The edge metric used was the maximum intersect between two objects i and j: max(overlapij/volumei,overlapij/volumej). Furthermore, appear and disappear costs of 0.8, as well the constraints for a maximum of 2 children and 2 parents were added. The fusion events were extracted from the solution adjacency graph and used to calculate the number of fusion events per time point. For each movie, the number of lumen fusion over time was then smoothed using a rolling mean of 10, with a minimum period of 1. To visualize the lumen tracks, the organoids were rendered as described above. To colour the 3D renderings to show the number of hours since the fusion event, the nodes of the organoids that have just fused were extracted and then the shortest path length from every lumen to the next fusion event, up to 10 time points away, in the nx graph was calculated.
Cell tracking
For the cell tracking, every time point of the green channel, containing actin cells in three movies, was segmented before registration. Then, the green channel was registered as described above, and the translations were applied to the cell masks and the organoid image. A cell track graph was built for each of these movies. The maximum distance used was 24.3 μm, any mask below 1,444 μm3 was discarded and the minimum volume node feature was set to 481.6 μm3. The edge metric used was the average between the distance between the centroids and the IOU of the label within their bounding boxes. The distance metric was defined as 1 – (dist/maxdist): the distance between the centroid of two cells within the maximal distance. For the IOU of labels within the bounding box, the mask of cells within the neighbourhood were extracted, padded to equal size and then the IOU between the cell’s masks were calculated. To create the tracklets, the maximum number of children and parents was set to one.
For calculating the net coordinated movement of the cells, the centre of mass from scipy of the segmentations, as well as the velocity vector of the tracks at this time point up to a length of 24 h were calculated. Then the cosine of the angle was calculated between the vector from the centre of mass to the centroid of the cell and the velocity vectors, where movement along the vector away from the centroid was 1 and movement towards the centre of gravity was −1.
The stream plot was created by analysing the directionality of the tracks in the current frame up to 24 h ago. Where a 2D representation of the movement of the tracks up to 24 h ago was calculated for each track at the centroid of each tracked cell. Then, a Gaussian smoothing filter of size 20 was used to smooth out the u, v and w velocity arrays. The angle in z was calculated as the arctan2 of w and the square root of the u and v velocity vectors, and the angle in xy was calculated using the arctan2 between u and v. Both of these angle arrays were then normalized to between 0 and 1. A colour array was constructed by using a 2D custom colour map that included the colours of the different directions, as well as each colour on a colour gradient from white to the xy colour to black for the z direction. Finally, all of this was combined into a stream plot.
The quiver plot was created similarly. For each track at a given time point, the velocities u, v and w from the track in the current frame up to 24 h ago in x, y and z were calculated. After this, the angles in z and xy were calculated, and the quiver plot was coloured by the directionality for the four directions in xy and again from white to the xy colour to black in z. For the Supplementary Video 9, the quiver plot was overlaid onto the GFP channel for every time point. To calculate the velocity components, the centroids of the tracks were smoothed with a rolling mean of 10 and a minimum period of 1. The displayed vectors were calculated using the displacement of the cells up the previous five time points.
4i Image analysis
4i Image processing
The raw images were pre-processed on the fly in NISelements. Background subtraction and shading correction were performed for the individual images of the z-stack, followed by generating maximum intensity projections. Tiled images were acquired with an overlap of 10% and stitched with the SVI Huygens software (v23.10.0p6; http://svi.nl). Image processing and feature extraction were carried out as previously described52 and re-implemented as a stand-alone Python package (phenoscapes). In brief, organoids were segmented within an initial masking step, followed by image registration based on the DAPI signals of all imaged cycles using ITK-Elastix (https://github.com/InsightSoftwareConsortium/ITKElastix), mask refining, cropping and denoising. Furthermore, we added processing modules for speckle removal and background estimation using SMO77, with SMO being applied before estimating a background model for respective imaging and elution cycles. Nucleus and cell segmentations were performed using Cellpose78, and fluorescence intensity and morphology features were extracted. The added modules work as described in the following section.
Speckles were determined to be bright spots that occurred across multiple channels and cycles. Therefore, for each stain, a Gaussian filter of size 5 and a threshold based on the top 2 percentile intensity were applied. Then, a speckle mask was created for pixels that occur in this filtered top two percentile across all channels or across four successive cycles in the same channel. The speckle masks were used to zero-out any high-intensity regions after background subtraction.
The lumen masks were extracted using the cell segmentation of the CTNNB1 stain. First, the labelled cell segmentations were binarized and combined with the speckle masks using binary fill holes. The masks were then smoothed using a Gaussian filter with size 20 and then binarized again. Any small holes (less than 238 µm2) within the organoid were then removed. Any background pixels within the organoid were considered to be lumen. The contours of both the organoid and the lumen masks were extracted. The distance from each cell to the nearest surface or lumen contour was calculated using KDTree from Scipy. Regionprops was used to extract the major axis length of each lumen.
The codebase for the re-implemented image processing pipeline is available (https://github.com/quadbio/morphodynamics_human_brain_organoid/phenoscapes).
4i Processing for the agarose and compartment analysis
The pipeline, as described above, was run with the following modifications. Before running the pipeline, individual organoids were extracted from the region of interest using a modified version of the simple masking module of phenoscapes, where connected components were used to extract all organoid masks and the bounding boxes surrounding these masks. The bounding box was expanded up to 400 pixels. Small organoids (less than 26,406.25 μm2) were excluded from further processing. Before calculating the alignment, the brightest 2% of the image was masked out. For the refined masking step, only the DAPI signal and a Gaussian filter with a sigma of 50 were used. For illustrating in Fig. 4, the staining image of ARL13B was bi-linearly downscaled to 1/10 resolution.
Yap stain qualifications
The pipeline was run as described above, with the following modifications. As described in 4i processing for the compartment analysis, before running the pipeline, individual organoids were extracted and small organoids (less than 26,406.25 μm2) were excluded from further processing. The bounding boxes for organoid extraction were expanded up to 128 pixels. For the refined masking step, only the DAPI signal and a Gaussian filter with a sigma of 50, as well as n_binary_operations of 30 were used. Furthermore, a speckle removal using the top three percentile was used. To segment the DAPI channel, the Cyto2 cellpose model with a diameter of 40.8, a flow threshold of 0.8 and a cell probability threshold of −1 was used. The mean and the median nuclear yap stain fluorescence were then quantified.
4i Compartment analysis
We fine-tuned cellpose models to either segment the CTNNB1 stain or the DAPI stain. The fine-tuned cellpose models used a cellprob_threshold set of 0.0 and a flow_threshold of 0.9. We used both the cellular and the nuclear segmentations to extract the three compartments: nuclei, cytoplasm or the ECM niche. For the nuclei and the cytoplasmic compartments, we calculated the IOU between each cell and nuclei within the cellular bounding box and assigned the nucleus with the highest IOU, but at least an IOU of 0.2 to each cell. Any cells without a nucleus or nuclei without a cytoplasm were discarded. For the extracellular compartment, the cell segmentations were expanded by 30 pixels and the rest of the cell zeroed out. For each compartment, we extracted features based on the mean intensity. Finally, we only included the measurement of the compartment where the proteins are known to occur. First, we analysed the cytoplasmic and nuclear components. For each stain, we clipped the lower 5th and upper 98th percentiles before further analysis. Any cell with either a nuclear or cellular compartment that is smaller than 50 or a combined size of smaller than 100 pixels was removed. Using Scanpy, we then scaled, calculated the PCA, extracted nearest neighbours with n_pcs set to 10, clustered using the Leiden algorithm with a resolution of 0.9 and extracted the UMAP with a min_dist of 0.2. We annotated the clusters based on the regions and cell types revealed from the dot and feature plots. We overlaid these regions onto the segmented compartments. To analyse the lumen morphology composition, each lumen was designated a region using the most often annotated region of the cells that have been assigned to the lumen and are at most 50 μm away.
The same workflow was applied to the extracellular compartment. However, any extracellular compartment that was smaller than 100 pixels was removed. Using the mean intensity measurement and Scanpy, the PCA was calculated; using 6 principal components, the neighbours were extracted, Leiden clustering was used with a resolution of 0.3 and the UMAP extracted using a min_dist of 0.2. For the analysis, the annotations of the nuclear or cellular compartment were used to add a region identifier to each corresponding extracellular compartment, as well as the abundance of each cluster in each brain region. All violin plots were calculated before the percentile clipping for both conditions and for the proteins in their respective compartment, except for the cytoplasmic ECM components, where the ECM protein intensities were measured in the cytoplasmic compartment.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.