Molecular basis of polyadenylated RNA fate determination in the nucleus

DNA sequences

All oligonucleotide plasmid vectors are annotated in Supplementary Table 6.

Purification of UAP56 and UAP56(Δ1–43)

His-tagged UAP56 constructs (10×His–3C–UAP56 or 10×His–3C–UAP56(Δ1–43), residues 44–428) were expressed in Escherichia coli BL21 DE3 RIL using autoinduction medium at 37 °C for 16 h. Following collection, cells were resuspended in lysis buffer (25 mM HEPES pH 7.9, 5% glycerol, 300 mM NaCl, 20 mM imidazole, 0.05% Tween-20, and protease inhibitors), disrupted via sonication, and clarified by centrifugation. The supernatant was sequentially filtered through 1-µm and 0.45-µm filters before affinity purification on a HisTrap HP 5 ml column (Cytiva), equilibrated in buffer A (25 mM HEPES pH 7.9, 5% glycerol, 300 mM NaCl, 20 mM imidazole). After washing with buffer A supplemented with 70 mM imidazole, bound proteins were eluted using a linear gradient of imidazole (70–200 mM in buffer A). Peak fractions were diluted in buffer B (25 mM HEPES pH 7.9, 5% glycerol, 1 mM DTT) to reduce the NaCl concentration to 100 mM and subsequently subjected to anion-exchange chromatography on a HiTrapQ 5 ml column (Cytiva), pre-equilibrated with buffer B. Elution was performed with a linear NaCl gradient (100–500 mM). Fractions containing UAP56 were concentrated and further purified via size-exclusion chromatography using a HiLoad 16/600 Superdex 200 pg column (Cytiva), equilibrated in buffer C (25 mM HEPES pH 7.9, 5% glycerol, 100 mM NaCl, 1 mM DTT). Peak fractions containing the purified protein were pooled, concentrated, flash-frozen, and stored at −80 °C.

Purification of LENG8–PS^M and SAC3D1–PS^M

Expression constructs encoding LENG8–PS^M (10×His–MBP–LENG8^491–800, 3×V5–PCID2, SEM1), SAC3D1–PS^M (10×His–MBP–SAC3D1^48–404, 3×V5–PCID2, SEM1), SAC3D1–PCID2-UAP56–UCM–N-UBM–SEM1 (10×His–MBP–SAC3D1^48–404, 3×V5–PCID2–UAP56–UCM–N-UBM, SEM1), LENG8–PCID2–UAP56–N-UBM, SEM1 (10×His–MBP–LENG8^491–800, 3×V5–PCID2–UAP56–N-UBM, SEM1) and their respective mutants were introduced into E. coli BL21 DE3 RIL (UCM is a UAP56-clamping motif). Cultures were grown in LB medium at 37 °C to OD600 ~1.0, at which point expression was induced with 0.5 mM IPTG, followed by overnight incubation at 18 °C. Cells were collected, lysed by sonication, and clarified by centrifugation. The supernatant was filtered (1 µm and 0.45 µm) and loaded onto a HisTrap HP 5 ml column equilibrated with buffer A, followed by washing and elution using a linear imidazole gradient up to 300 mM. Peak fractions were diluted to 50 mM NaCl in buffer B and subjected to anion-exchange purification on a HiTrapQ HP 5 ml column. After washing, complexes were eluted with a NaCl gradient (100–500 mM). Size-exclusion chromatography using a HiLoad 16/600 Superdex 200 pg column (Cytiva) in buffer C containing 250 mM NaCl yielded the final purified complex, which was concentrated, flash-frozen, and stored at −80 °C.

Recombinant EIF4A3 was purified as described previously¹⁶.

His-tagged DDX19 was expressed in E. coli BL21 DE3 RIL using LB medium, induced with 0.5 mM IPTG and expressed at 37 °C for 3 h. Following collection, cells were resuspended in lysis buffer (25 mM HEPES pH 7.9, 5% glycerol, 300 mM NaCl, 20 mM imidazole, and protease inhibitors), disrupted via sonication, and the lysate was cleared by centrifugation. The supernatant was sequentially filtered through 1-µm and 0.45-µm filters before affinity purification on a HisTrap HP 5 ml column (Cytiva), equilibrated in buffer A. The column was washed with buffer A containing 30 mM imidazole and bound proteins were eluted using a linear gradient of imidazole (50–300 mM). The peak fractions were incubated with 3C protease to cleave off the tag, and after 3C cleavage the peak fractions were diluted in buffer B to reduce the NaCl concentration to 50 mM, filtered through a 0.22-µm filter and next subjected to anion-exchange chromatography on a HiTrapQ 5 ml column (Cytiva), pre-equilibrated with buffer B supplemented with 50 mM NaCl. The column was washed with buffer B supplemented with 50 mM NaCl following sample loading. Elution was performed with a linear NaCl gradient (50–500 mM). Peak fractions containing DDX19 were concentrated and further purified via size-exclusion chromatography using a HiLoad 16/600 Superdex 200 pg column (Cytiva), equilibrated in buffer C. Peak fractions containing the purified protein were pooled, concentrated, flash-frozen, and stored at −80 °C.

Analytical gel filtration

For each purified protein or complex an aliquot of 62.5 μg was loaded onto a Superdex 200 Increase 5/150 column (Cytiva), equilibrated in the respective gel filtration buffers. Peak fractions were analysed via SDS–PAGE (4–12% gradient) and visualized by Coomassie staining.

UAP56–LENG8 and UAP56–SAC3D1– PS^M pulldown

MBP-tagged LENG8–PS^M or SAC3D1–PS^M was incubated with a fourfold molar excess of UAP56 or UAP56(Δ1–43) in buffer D (25 mM HEPES pH 7.9, 40 mM NaCl, 5% glycerol, 0.01% Igepal CA-630, 1 mM MgCl₂, 1 mM TCEP), with or without 50 μM 15U RNA and 1 mM AMP-PNP. Reactions were mixed by rotation at 4 °C for 1 h before adding 30 μl of pre-equilibrated amylose resin (E8021S, NEB). After an additional 1-h incubation at 4 °C, unbound proteins were removed by centrifugation (1,500g, 2 min, 4 °C) and 3 washes with buffer D. Bound proteins were eluted by incubation at 4 °C for 1 h in buffer D supplemented with 100 mM maltose. Input and elution fractions were analysed via SDS–PAGE (4–12% gradient) and visualized by Coomassie staining.

LENG8–ZFC3H1 pulldown

Magnetic Streptavidin beads (50 μl in-house produced slurry per reaction) were equilibrated in wash buffer (25 mM HEPES pH 7.9, 100 mM NaCl, 5% glycerol, 1 mM MgCl₂, 1 mM TCEP, 0.01% Igepal CA-630). Wild-type or mutant ZFC3H1 peptide (200 µg) with an N-terminal biotin and a C-terminal fluorescein, were added to the beads in a 100 μl reaction volume and incubated on a rotating wheel at room temperature for 60 min. To remove excess peptide, beads were washed three times with wash buffer. Subsequently, 15 μg recombinant LENG8(283–346) or LENG8(283–346) F301A in a 100 μl reaction volume were added to the beads and the reaction incubated on a rotating wheel for 60 min at 4 °C. Following the incubation, beads were washed three times with wash buffer before bound proteins were eluted by incubating the beads for 5 min with 200 mM glycine pH 2.5. Elutions were neutralized with Tris pH 10.4 and separated by SDS–PAGE. To detect the fluorescently labelled peptides, gels were imaged in the Fluorescein channel on a Bio-Rad Chemidoc Imager prior to Coomassie staining to visualize the proteins.

RNA unclamping assay

Biotinylated 15U RNA (33 µM) was mixed with recombinant UAP56 (10 µM) and 1 mM ATP in buffer E (20 mM HEPES pH 7.9, 40 mM KCl, 2 mM MgCl₂, 5% glycerol, 0.1% Igepal CA-630). This mixture was incubated with 20 µl NeutrAvidin Agarose beads (29202, Thermo Scientific), pre-equilibrated in buffer E, for 30 min at room temperature. After washing to remove excess UAP56 and ATP, beads were resuspended in buffer E and aliquoted. LENG8–PS^M or SAC3D1–PS^M (2.2 µM or 0.44 µM) was added, followed by a 10-min incubation at room temperature. Unbound proteins were removed by sequential washes in high-salt buffer (buffer E with 500 mM KCl) and buffer E. RNA-bound proteins were eluted using 0.4 μg benzonase in buffer E for 10 min at room temperature, followed by SDS–PAGE analysis and quantification of remaining RNA-clamped UAP56 in Fiji.

Grating-coupled interferometry

Grating coupled interferometry measurements were conducted using a Creoptix WAVE system (Creoptix) with 4PCP WAVEchips (quasi-planar polycarboxylate surface). Chips were conditioned in borate buffer (100 mM sodium borate pH 9.0, 1 M NaCl) before immobilization of a monoclonal anti-V5 antibody (R960252, Invitrogen; 2 μg ml⁻¹ in 10 mM sodium acetate pH 5.0) via amine coupling. The surface was then passivated with 0.5% BSA (in 10 mM sodium acetate pH 5.0) and quenched with 1 M ethanolamine pH 8.0. V5-tagged LENG8–PS^M or SAC3D1–PS^M complexes were captured to the desired density. UAP56 was injected as a 1:2 dilution series, starting at 5 µM, with or without 200 µM 15U RNA, in 25 mM HEPES pH 7.9, 50 mM KCl, 1 mM MgCl₂, 1 mM TCEP, with and without 1 mM ATP at 25 °C. Blank injections were used for double referencing, and a DMSO calibration curve corrected for bulk refractive index effects. Data were processed using Creoptix WAVEcontrol software, applying x/y offset correction, DMSO calibration, and double referencing. A one-to-one binding model was used for fitting, and results were plotted in R.

ATPase assay

Steady-state ATPase activity of UAP56 was measured using an NADH-coupled enzymatic assay. Final reaction mixtures contained 5 U ml⁻¹ rabbit muscle pyruvate kinase (Type III, Sigma-Aldrich), 5 U ml⁻¹ rabbit muscle L-lactic dehydrogenase (Type XI, Sigma-Aldrich), 500 µM phosphoenolpyruvate, and 50 µM NADH. Reactions (10 µl) were assembled in 1,536-well plates using buffer F (25 mM HEPES pH 7.9, 40 mM KCl, 0.5 mM MgCl₂, 5% glycerol, and 0.5 mM ATP), with either 2 µM UAP56 or 0.1 µM UAP56 in the presence of LENG8–PS^M or SAC3D1–PS^M, and 100 µM 15U RNA when indicated. The decrease in NADH fluorescence emission was monitored at 37 °C using a PHERAstar FS plate reader (BMG LABTECH). A calibration curve from a NADH dilution series (0.03–100 µM) was used for quantification. ATPase activity was determined by linear regression of the NADH decay curves, corrected for ATP consumption, and expressed as ATP hydrolysis rates (molecules of ATP hydrolysed per second per enzyme). Reaction components were analysed by SDS–PAGE (4–12% gradient) and visualized using Coomassie staining.

Cryo-EM sample preparation, imaging, and analysis

Cryo-EM sample preparation

For cryo-EM sample preparation we adopted a strategy previously used for UAP56–TREX-2^M (ref. ⁷): We fused UAP56 to PCID2 to optimize complex stochiometry and further fused UAP56 to a UCM and N-UBM. The latter was done to further promote the RNA-clamped conformation of UAP56 and more accurately mimic the native mRNP-bound state of UAP56, where N-UBM and UCM are present at high local concentrations to engage RNA-bound UAP56². The N-UBM and UCM peptides are not observed in our cryo-EM structures and hence are not depicted or discussed in the main text. For cryo-EM grid preparation, LENG8–PCID2–UAP56–N-UBM–SEM1 (at 0.5 mg ml⁻¹) or SAC3D1–PCID2–UAP56–UCM1–N-UBM–SEM1 (at 0.5 mg ml⁻¹) were incubated in buffer G (25 mM HEPES pH 7.9, 5% glycerol, 1 mM MgCl₂, 1 mM TCEP, 100 μM 15U RNA) with 1 mM AMP-PNP or 1 mM ATP on ice for 10 min. Cryo-EM grids were then prepared by applying 4 µl of the sample to glow-discharged Cu R1.2/1.3 200-mesh holey carbon grids (Quantifoil). Grids were blotted at 8 °C and 90% humidity and plunged into liquid ethane using a Leica EM GP2.

Cryo-EM data acquisition and processing of a UAP56–LENG8–PS^M complex AMP-RNP

Data collection was performed on a Titan Krios G4 electron microscope operating at 300 kV, equipped with a cold field emission gun, a Selectris energy filter (5 eV slit width, ThermoFisher), and a Falcon 4i direct electron detector (ThermoFisher). The objective aperture was retracted, and a 50 µm C2 aperture was used. A total of 5,405 micrographs were recorded using EPU software in .eer format, a pixel size of 0.575 Å per pixel, a total electron dose of 50 e⁻ Å⁻², and defocus values ranging from −1 to −2.5 µm. On-the-fly preprocessing, including motion correction and contrast transfer function estimation, was performed using the CryoSPARC⁴⁸ Live v113 workflow. Approximately 1.3 million particles were picked in WARP, extracted with a 400 Å box, binned to 1.8 Å per pixel, and subjected to 2D classification. Ab initio reconstructions of 45,345 particles selected from the 2D classification yielded a initial map for clamped UAP56 bound to LENG8–PS, which was further subjected to non-uniform refinement, from which 7,886 particles were selected by per-particle scale. These were then 3D refined in Relion 5.0 using BLUSH⁴⁹, resulting in a 6.2 Å UAP56–RNA–LENG8–PS Map F.

Cryo-EM data acquisition and processing of LENG8–PS^M and UAP56-NTD– LENG8– PS^M complexes

Data were collected and pre-processed as outlined above. A total of 6,578 micrographs were recorded using EPU software in.eer format, a pixel size of 0.575 Å per pixel, a total electron dose of 50 e⁻ Å⁻², and defocus values ranging from −1 to −2.5 µm. Approximately 1.5 million particles were picked in WARP, extracted with a 400 Å box, binned to 1.8 Å per pixel, and subjected to 2D classification, yielding 183.858 LENG8–PS particles. Ab inito reconstruction considering high-resolution frequencies resulted in an interpretable LENG8–PS cryo-EM map from 82,873 particles. These were then re-extracted with a 400 Å box, binned to 0.90 Å per pixel and subjected to a non-uniform refinement yielding the 3.5 Å resolution LENG8–PS Map D. Further 3D classification in Relion 5.0 revealed a subset of 4,824 particles with the UAP56 NTD bound, which refined to 4.86 Å (UAP56-NTD–LENG8–PS, Map E).

Cryo-EM data acquisition and processing of and a SAC3D1–PS^M a UAP56–SAC3D1–PS^M complex

We collected three datasets with the same microscope specifications and settings as for UAP56–LENG8–PS. Dataset 1 consists of 11,743 micrographs, dataset 2 consists of 6,006 micrographs collected at a tilt angle of 30° and dataset 3 contains 4,543 micrographs. We again performed on-the-fly preprocessing (patch motion correction and contrast transfer function estimation) using the CryoSPARC live routine before picking 4.5, 1.4 and 0.5 million particles (datasets 1, 2 and 3, respectively) in WARP. For processing in CryoSPARC, particles were extracted with a 400 Å box and binned to 1.8 Å per pixel. After 2D classification we obtained 276,000, 47,000 and 93,000 UAP56–SAC3D1–PS^M particles and conducted three rounds of heterogeneous refinement using ab initio models generated with the particles from dataset 1 (ref. ⁵⁰). The resulting 129,495 particles were then re-extracted with a 400 Å box and binned to 0.90 Å per pixel and subjected to a non-uniform refinement yielding the 3.0 Å UAP56–SAC3D1–PS complex Map A. A further local refinement using a UAP56 mask resulted in the 2.6 Å UAP56–AMP-PNP–RNA Map B. The 2D-selected particles from dataset 3 (~93,000) were further subjected to ab initio reconstruction considering high-resolution information, yielding a readily interpretable cryo-EM map of a SAC3D1–PS^M complex. Re-extraction with a 400 Å box and binning to 0.90 Å pixel^-1, non-uniform refinement and selection of particles per scale yielded 17,936 particles, which allowed for the refinement of a SAC3D1–PS^M complex cryo-EM map to 3.60 Å (Map C).

Model building

Structural modelling of all complexes began with Alphafold2 Multimer⁵¹ predictions of the respective complexes. The predicted models were docked into the respective maps and manually adjusted using COOT and ISOLDE in ChimeraX. Final refinements were performed in Phenix using the phenix.real_space_refine protocol, applying secondary structure and rotamer restraints to optimize fit and stereochemistry.

HeLa cell culture and cell line generation

HeLa Kyoto or HCT116 cells were grown in Dulbecco’s modified Eagle’s medium (DMEM) supplemented with 10% fetal bovine serum (FBS) and 1% penicillin/streptomycin at 37 °C, 5% CO₂. Transient transfections were performed using Lipofectamine 3000 (Invitrogen), according to the manufacturer’s instructions. CRISPR–Cas9 mediated genomic knock-ins using homology dependent repair donor vectors²⁵ of C-terminal 3×Flag, 2×HA–FKBP12(F36V)–V(dTAG)²⁷ in HeLa Kyoto and HCT116 cells was carried as described before⁸ with single guide RNAs (sgRNAs) and homology arms generated using primers listed in Supplementary Table 6 and cloned into tagging cassettes carrying Hygromycin or Neomycin resistance genes (plasmids listed in Supplementary Table 6). After transfection and antibiotic selection single cell clones were grown and tested by genomic PCR with primers flanking the insertion region, as well as with western blotting analysis. In the GANP–2×HA–dTAG cell line, we observed an additional band, which we interpreted as a truncated protein isoform localized to the cytoplasm. This isoform is produced from an RNA transcript that uses an early polyadenylation site appearing upstream of the tag insertion position. Since dTAG^V-1 treatment led to rapid and substantial reduction of full-length GANP we opted to utilize this cell line.

To generate stably expressing LENG8–3×Flag and ZFC3H1–3×Flag constructs of wild-type and mutant variants, HeLa cells were transfected with pBAC vectors as described. Human LENG8 and ZFC3H1 cDNA constructs were cloned and inserted into piggyBAC (pBAC) vectors⁵² using NEBuilder HiFi DNA assembly (NEB). The LENG8 CDS was inserted into a doxycycline-inducible pBAC vector, harbouring a C-terminal 3×Flag tag and a puromycin selection marker. The ZFC3H1 CDS was inserted into a constitutively expressed pBAC vector, harbouring a C-terminal 3×Flag tag and a Blasticidin selection marker. Generated constructs are listed in Supplementary Table 6. LENG8- and ZFC3H1–2×HA–dTAG cells were transfected with the pBAC vectors along with a piggyBAC transposase expressing vector (pBASE) with Lipofectamine 3000. Cell pools were selected with puromycin or Blasticidin for 7–10 days until negative control cells died. For induction of expression of LENG8 pBAC constructs, cells were incubated for 24 h in culture medium supplemented with 1 mg ml⁻¹ doxycycline (Sigma-Aldrich) before collection. Expression of the constructs was validated by western blotting analysis using antibodies against ZFC3H1 or Flag. Depletion of endogenous dTAG-tagged proteins was performed by the addition of dTAG^V-1 to the culture medium for indicated time points at a concentration of 500 nM. Induction of expression of exogenous LENG8–Flag constructs was performed by adding 1 µg ml⁻¹ doxycycline.

Western blotting analysis of whole-cell extracts

Whole-cell protein lysates were prepared using lysis buffer (20 mM Tris-HCl, 0.5% NP-40, 150 mM NaCl, 1.5 mM MgCl₂, 10 mM KCl, 10% glycerol, 0.5 mM EDTA, pH 7.9) freshly supplemented with protease inhibitors (Roche). Samples were clarified by centrifugation at 20,000 rcf for 10 min. Sample concentrations were adjusted after Bradford measurement and denatured by the addition of NuPage Loading Buffer (Invitrogen) and NuPage Sample Reducing Agent (Invitrogen) before boiling at 95 °C for 5 min. SDS–PAGE was carried out on NuPage 4%–12% Bis-Tris (Invitrogen) gels migrated in NuPage MOPS Running Buffer (Thermo) and transferred onto PVDF membranes in NuPage Transfer buffer (Thermo) at 4 °C, 15 V overnight. Western blotting analysis was carried out according to standard protocols with the antibodies listed in the Supplementary Table 6 and HRP-conjugated secondary antibodies (Dako). Bands were visualized by Super Signal West Femto chemiluminescent ECL (Thermo) and captured using an ImageQuant 800 imaging systems (GE Healthcare). The uncropped gel images with reference to panels in main and Extended Data figures are presented in Supplementary Fig. 1.

Immunoprecipitation followed by western blotting analysis

Approximately 2 × 10⁷ cells per immunoprecipitation were extracted in lysis buffer (20 mM Tris-HCl, 0.5% NP-40, 150 mM NaCl, 1.5 mM MgCl₂, 10 mM KCl, 10% glycerol, 0.5 mM EDTA, pH 7.9) freshly supplemented with protease inhibitors and cleared by centrifugation at 20,000 rcf for 20 min. Clarified lysates were incubated overnight at 4 °C with Flag antibody and Protein G Dynabeads (Thermo). Beads were washed three times with HT150 extraction buffer, transferring beads to a fresh tube on the final wash. For benzonase-treated immunoprecipitations, samples were resuspended in HT150 buffer freshly supplemented with protease inhibitors and 2 mM MgCl₂ and split in two. One half of each sample was mock-treated and the other incubated with 500 units of benzonase for 20 min at 25 °C, 12,000 rpm. Samples were washed twice for 5 min at room temperature in 20 mM Tris-HCl pH 8 freshly supplemented with 2 mM CaCl₂. Proteins were eluted by boiling in 1× NuPage loading buffer (Invitrogen) for 5 min. Supernatants were mixed with 10× Reducing Agent (Invitrogen) and denatured for a further 5 min at 95 °C before proceeding with western blotting analysis.

Immunoprecipitations followed by mass spectrometry

All immunoprecipitations were performed label-free and in triplicates. GANP–3×Flag, PCID2–3×Flag, LENG8–mAID–3×Flag, and control HeLa Kyoto cells were collected as described above. Protein extractions were performed using material from 15 million cells per immunoprecipitation with 1 ml extraction buffer (20 mM Tris-HCl, 1% IGEPAL, 150 mM NaCl, 1.5 mM MgCl₂, 10 mM KCl, 10% glycerol, 0.5 mM EDTA, pH 7.9) supplemented with 1× protease inhibitors cocktail (Roche). After brief sonication (3× 10 s, Amplitude 1, Branson Sonifier 250), the protein extracts were clarified by centrifugation (20,000 rcf for 10 min at 4 °C). Anti-Flag magnetic beads were prepared with anti-Flag M2 antibodies (Sigma F3165) conjugated to Dynabeads M-270 Epoxy (Invitrogen) as previously described⁵³. Beads were washed three times with lysis buffer, for endogenous GANP–3×Flag, LENG8–3×Flag and PCID2–3×Flag immunoprecipitations, lysis buffer with additional NaCl to 450 mM final concentration was used (high stringency). For nuclease treatment beads were resuspended in 40 μl extraction buffer with 2 mM MgCl₂, containing 1 μl Pierce Nuclease (for TREX-2 and TREX-2-like immunoprecipitations, Sigma E1014), Benzonase (for ZFC3H1 immunoprecipitations, Sigma) or as a control 1 μl of 1 mg ml⁻¹ BSA (as indicated for the different experiments) and incubated with agitation at 25 °C for 20 min. Beads were washed with extraction buffer once and then proteins were eluted with SDS buffer (2% SDS, 100 mM Tris pH 6.8, 10% glycerol) at 25 °C, shaking for 5 min. Milder lysis and wash conditions using HT150 buffer (20 mM HEPES pH 7.4, 150 mM NaCl, 0.5% Triton X-100) were applied for 3×Flag immunoprecipitations of ZFC3H1—both endogenously and exogenously expressed—as well as for exogenously tagged LENG8–3×Flag. Mass spectrometry sample preparations were performed with the protein aggregation capture (PAC) procedure with proteolytic digestion on MagResyn HILIC beads using trypsin or chymotrypsin as indicated¹². The peptides were purified and concentrated on C18 stage tips before subjected to liquid chromatography–mass spectrometry analysis with an Easy nanoLC system coupled directly to a Thermo Scientific Orbitrap Exploris 480 mass spectrometer. Mass spectrometry data were acquired by data dependent acquisition and searched against the UniProt protein sequence database using MaxQuant, with ‘match between runs’ and ‘label-free quantification’ enabled. The MaxQuant protein group output was analysed with the DEP package as previously described^44,54,55.

Chemical fractionation of HeLa cells

Chemical fractionation was performed using a protocol adapted from ref. ⁵⁶. In brief, cells collected using trypsin digestion were first lysed using cytosol extraction buffer (0.15% NP-40, 10 mM Tris pH 7.4, 150 mM NaCl). Then nuclei were separated from cytoplasmic fractions using centrifugation, followed by nuclei washes using PBS solution and extraction of protein using lysis buffer extraction buffer (20 mM Tris-HCl, 1% IGEPAL, 150 mM NaCl, 1.5 mM MgCl₂, 10 mM KCl, 10% glycerol, 0.5 mM EDTA, pH 7.9) or RNA using TRIzol reagent according to the manufacturer’s instructions.

Immunofluorescence and colocalization analysis

Cells seeded on microscope coverslips were fixed with 4% paraformaldehyde in PBS for 20 min at room temperature, washed twice with PBS, and permeabilized with 0.1% Triton X-100 in PBS for 10 min at room temperature. Subsequently, cells were washed with PBS twice and blocked with 5% BSA in PBS-T for 1 h at room temperature. Coverslips were incubated for 1 h at room temperature with primary antibody dilution in 1% BSA, followed by three 5 min washes with PBS. Then, coverslips were incubated in a secondary antibody dilution with 1% BSA for 1 h at room temperature. Finally, cells were washed three times for 5 min with PBS, counterstained with DAPI and mounted onto glass slides using ProLong Gold Antifade Mountant. Images were acquired using a Zeiss LSM 980 confocal microscope equipped with Airyscan 2 under 40× or 63× oil-immersion Plan-Apochromat objectives. All images within the same experiment were taken with the same excitation power and exposure time and processed similarly using ZEN Blue 3.6 software. All antibodies and applied concentrations are listed in Supplementary Table 6. Pixel-based colocalization analyses were performed using the ZEN 3.6 (blue edition) colocalization module, with threshold setting based on the control background images and extracting the weighted colocalization coefficients for each image. For each cell line, the colocalization coefficient was calculated from six 40× images in two independent experiments, with at least 139 cells in total included in the analyses.

RNA extraction and RT–qPCR

HeLa and LENG8–2×HA–dTAG, ZFC3H1–2×HA–dTAG and GANP–2×HA–dTAG cells were treated with 500 nM of dTAG^V-1 or untreated for 4 h. RNA was extracted using TRIzol (Invitrogen) and treated with TURBO DNase (Invitrogen) according to the manufacturer’s protocol. To measure RNA levels, reverse transcription was carried out with SuperScript III reverse transcriptase (Invitrogen) using 1 µg RNA and a mixture of 20 pmol random hexamer in a 20 µl reaction at 50 °C according to the manufacturer’s protocol. Subsequently, quantitative PCR (qPCR) was performed using Platinum SYBR Green qPCR SuperMix-UDG (Invitrogen) in a ViiA 7 Real-Time PCR machine (Life Technologies with the primers listed in Supplementary Table 6). Relative quantities were calculated by normalizing samples to GAPDH mRNA levels. For pA⁺ RNA-seq, RNA was quality checked on an Agilent 2100 Bioanalyzer (Agilent Technologies) for integrity before shipping to the sequencing provider.

pA⁺ RNA-seq library generations

All library construction and sequencing were paid services from the Beijing Genome Institute (BGI) in case of total pA⁺ RNA-seq and from Lexogen in case of the fractionated and exogenously expressing ZFC3H1 pA⁺ RNA-seq. Total RNA was extracted using TRIzol reagent according to the manufacturer’s instructions and transferred to BGI or Lexogen, which performed pA⁺ RNA selection using oligo-dT beads followed by strand-specific library preparation and sequencing.

3′ end-seq RNA library preparation

Triplicates of total 3′ end-seq libraries in presence or absence of EPAP, in EXOSC3–2×HA–dTAG cells, treated or not with dTAG^V-1 for 4 h were generated and processed as before⁵⁷. In brief, to discriminate pA⁺ from non-polyadenylated (pA⁻) RNA 3′ ends, 10 μg of RNA was split in two, subjecting one aliquot to in vitro polyadenylation by E. coli poly(A) polymerase (Invitrogen) in a 40 μl reaction at 30 °C (EPAP treated) according to the manufacturer’s protocol, while mock treating the other. Samples were then purified with the PureLink RNAmini kit (Invitrogen) and submitted for RNA 3′ end sequencing.

Analysis of RNA-seq data

Annotation of pA⁺ PTTs

Polyadenylated PTTs, displaying sensitivity to ZFC3H1 and/or LENG8 depletion, were annotated using a custom pipeline. In brief, starting from our transcriptome annotation of HeLa cells²⁹, transcription units were filtered to be longer or equal to 10 kb. At these transcription units, the pA⁺ RNA-seq coverage for the ZFC3H1–2×HA–dTAG and LENG8–2×HA–dTAG cell lines treated with DMSO or dTAG, was measured from TSS to transcription end site (TES) with a bin of 50 bp using rtracklayer⁵⁸. Gene bodies were then scaled to 2 kb, replicates averaged, a pseudocount of 1 added and the log₂ fold change (LFC) dTAG/Mock performed for each cell line. Using data from each cell line separately, transcription units were then filtered to display increased signal (LFC > 0.2) within the first 200 bp of the scaled gene body and no such difference in the last 200 bp (LFC < 0.2). Of note, the lenient LFC reflected the accumulation of a PTT overlapping with the full transcription unit, and the present criteria filtered out cases where both the full-length transcription unit and the PTT displayed sensitivity to the specific depletions. Following this, the PTT-harbouring transcription units, identified in ZFC3H1 and LENG8 depletions, were pooled. For each of these the maximum LFC value within the first 200 bp was defined and the last bin reaching 80% of this value along the scaled gene body was used to define an area to screen for the PTT TES. For each transcription unit, in the defined region ±5% of the transcription unit length, we measured, without binning or gene body scaling, the coverage from 3′ end RNA-seq, non-EPAP treated data from ZFC3H1-mAID cells mock or AID treated²⁸. LFCs were measured as before and 3′ end peaks with ZFC3H1-sensitivity (LFC > 1) over the areas of interest were called. In each area, the strongest peak was considered as the PTT TES. A manual curation of the identified PTTs was then performed to filter out artifacts.

Annotation of pA⁻ PTTs

At first introns, the LFCs of normalized coverage at individual positions were calculated upon EXOSC3 depletion in EPAP- and non-EPAP-treated conditions (M.L.R. et al., unpublished observations; Supplementary Fig. 5a). To ensure annotation of unadenylated ends only, the LFC of the non-EPAP condition was subtracted from that of the EPAP condition. The most downstream position, displaying a residual LFC >0.2 was subsequently considered as the TES of the longest unadenylated PTT of the locus.

Processing of total and fractionated pA⁺ RNA-seq data

The raw sequence reads received from service providers were first quality checked using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Reads were then trimmed for adaptors and filtered using Trim Galore (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). Trimmed reads were mapped to hg38, using Hisat2 in paired-end mode⁵⁹. Mapped files were sorted and checked for pairing using SAMtools⁶⁰. Reads were then deduplicated using MarkDuplicates (Picard; https://broadinstitute.github.io/picard/) and further filtered to keep only unique mappers by using SAMtools. Relative samples size was then estimated by generating coverage counts using htseq-count⁶¹ (HTSeq-counts) over the Gencode annotation to avoid any bias due to accumulation of short unstable transcripts present in our in-house annotation and then analysed by DESeq2³⁰ to define size factors. In the case of fractionated pA⁺ RNA-seq, size factors were measured separately for the nuclear and cytoplasmic fractions to avoid any compensation of compartment specific phenotypes. Finally, reads were converted to bigwig files normalized to size factors using bamCoverage (Deeptools)⁶². The RNA sample of HeLa 4-h dTAG^V-1 replicate 4 from the nuclear fraction appeared to suffer from a strong technical issue arising from large ribosomal RNA contamination. This replicate was therefore eliminated from all analyses but is still listed as part of the Gene Expression Omnibus (GEO) dataset.

Differential expression analysis

RNA sensitivities to LENG8, ZFC3H1 and GANP depletion were defined based on DESeq2 differential expression analysis of total pA⁺ RNA-seq using untreated cells as controls. For each depletion transcription units with adjusted P values < 0.1 in DESeq2³⁰ analysis were considered as measurable and LFC over control >0.5 was counted as upregulated, while the LFC coverage over control < −0.5 was counted as downregulated. Owing to the strong correlation between coverage changes of LENG8 and ZFC3H1 depletion (Fig. 3c), ‘PAXT-sensitive’ transcription units were defined as upregulated in either of the two depletions. Plots exploring the relationship between exons or processed RNA lengths and PAXT sensitivities (Fig. 3e and Extended Data Figs. 6k and 8i,j) were based on our published in-house HeLa transcript annotation²⁹ and LFC coverage for all transcription units with adjusted P values < 0.1 in DESeq2.

Nuclear to cytoplasmic ratios measurements

Nuclear to cytoplasmic ratios were calculated for each transcription unit using non-log transformed counts of nuclear and cytoplasmic pA⁺ RNA coverages. Zero value counts were filled with minimal values.

Transcription unit clustering based on fractionated pA⁺ RNA-seq behaviour

First the average LFC coverage, as measured by rtracklayers, was calculated for the total, nuclear and cytoplasmic fractions. For the fractionated sequencing, all proteins depletions were compared to the maternal HeLa cell line treated with dTAG^V-1. The LFC upon ZFC3H1 depletion was then used separately in the nuclear and cytoplasmic fractions, at each transcription unit, to define a behaviour as ‘up’ (>0.5), ‘down’ (<−0.5), or ‘unaffected’. Nine clusters were then generated corresponding to all possible combinations (nuclear ‘up’/cytoplasmic ‘up’, nuclear ‘up’/cytoplasmic ‘unaffected’, etc.). Small clusters (with less than 200 transcription units) were removed from the final heat map in Extended Data Fig. 7c.

Analysis of transcripts with retained introns

For every intron unspliced reads spanning the 5′ and 3′ splice sites were counted using custom code relying on Samtools⁶⁰, and every intron with at least one unspliced read at both junctions in unperturbed HeLa cells was considered as retained. The genomic coordinates of detained introns were obtained from Boutz et al.³⁶. As this annotation originates from four combined cell lines, we first merged it with our in-house HeLa specific annotation²⁹. Considering the generally unspliced nature of detained introns, we first filtered out these when overlapping totally or partially introns from our annotations. We then further filtered detained introns to be fully included in our exons to avoid overhang at the TSS or TES due to alternative isoforms. Finally, the few cases where a detained intron was starting or ending a transcription unit, without being preceded or followed by an exon, were filtered out. Similarly, reads spanning 5′ and 3′ splice junctions for both retained and detained introns were counted in total, nuclear and cytoplasmic fractions. Introns where dual splice junctions showed an increase upon ZFC3H1 or LENG8 depletion were counted as PAXT-sensitive.

Metagene profiles, heat maps and display of sequencing information

Metagene profiles and heat maps were produced using custom R and Python scripts. In brief, the rtracklayer R package was used to collect read coverage values for the window ±500 bp relative to the TSS or TES, or over specific exonic/intronic features. Coverage values were then binned in 50 nt bins and log₂-transformed after the addition of a pseudocount of 1. This measurement of coverage was then used to compute LFC values and generate subsequent plots. Heat maps were made using custom R or Python code based on the R package ComplexHeatmap⁶³ or Seaborn⁶⁴, respectively. The mean of coverage values across transcription units over each bin were also computed and plotted as metagene profiles using custom R code. A 95% confidence interval of the mean coverage was displayed for each sample and was measured through 50 steps of bootstrap samplings with replacement. Aggregate plots and heat maps of sequencing data were generated based on BigWig files using customized R scripts. Genome browser views based on BigWig files were generated using the R package seqNdisplayR⁶⁵.

In vivo RNA-binding assays

Cells expressing LENG8–3×Flag of wild-type or R563A mutant variants were induced with doxycycline for 24 h and crosslinked with 150 mJ cm⁻² of 254 nm UV lamp using Stratalinker2400 (Stratagene). Lysate preparations, anti-Flag immunoprecipitations, RNAse I (Thermo Scientific), TurboDNAse treatments (Thermo Scientific), radiolabelling using γ-³²P ATP (PerkinElmer) and PAGE of RNA–protein complexes were performed as described³⁷. The phosphor imaging of gels with radiolabelled samples was performed using a Typhoon scanner (Amersham).

iCLIP experiments

iCLIP was performed as previously described³⁷ with minor modifications. In brief, HeLa or HeLa LENG8–mAID–3×Flag cells at ~80% confluency were UV-crosslinked at 254 nm with a dose of 150 mJ cm⁻² using a Stratalinker 2000. One (UAP56) or two (LENG8–3×Flag) 15-cm plates were used per immunoprecipitation and experiments were performed in duplicate. Whole-cell extracts were sonicated for 30 s and treated with TURBO DNase and RNase I prior to immunoprecipitation, using anti-UAP56 (E7W7M, Cell Signaling Technology) or anti-Flag M2 antibodies immobilized on Protein G Dynabeads. Protein–RNA complexes were subjected to high-salt washes, including freshly added 2 M urea in the wash buffer, and separated by PAGE. RNA was subsequently extracted and iCLIP libraries were constructed³⁸ before sequencing on a NovaSeq platform (Lexogen).

Crosslink positions were mapped to the hg38 human genome using as previously described⁶⁶. Counts were then generated over our in-house HeLa transcript annotation²⁹ and normalized to transcript abundance through division by the respective transcripts mean log₂(coverage) of average replicates of the no dTAG^V-1 EXOSC3 control sample (M.L.R. et al., unpublished observations). To display iCLIP log2-transformed coverages across PAXT-sensitive or -insensitive transcript classes (ncRNAs, PROMPTs and PTTs), typical exosome targets were classified based on their sensitivity to ZFC3H1 depletion in pA⁺ RNA-seq using LFC (>0.5 for ZFC3H1-sensitive and <0.3 but >−0.3 for insensitive transcripts) and curated to remove overlapping transcripts. Comparisons were further restricted to monoexonic ncRNAs.

Puromycin labelling assays

HeLa and LENG8–2×HA–dTAG, ZFC3H1–2×HA–dTAG and GANP–2×HA–dTAG-expressing cells were grown in the presence of 500 µM dTAG^V-1 for additional 0, 4 and 24 h. Before collection by snap freezing, 5 µg ml⁻¹ of puromycin was added to cell medium for 30 min. Puromycin incorporation was assessed by western blotting analysis.

Whole-cell proteome analysis using pulsed SILAC

HeLa and LENG8–2×HA–dTAG, ZFC3H1–2×HA–dTAG and GANP–2×HA–dTAG cell lines were initially cultured in DMEM medium in the presence of 73 mg l⁻¹ l-lysine HCl and 28 mg l⁻¹ l-arginine HCl, (Sigma) (Lys0/Arg0 medium) for 24 h. Cells were then pre-treated with either 500 µM dTAG^V-1 or an equivalent volume of DMSO for 4 h. Following this, the medium was switched to medium containing 73 mg l⁻¹ l-lysine HCl and 28 mg l⁻¹ l-arginine HCl l-lysine (¹³C₆¹⁵N₂) and l-arginine (¹³C₆¹⁵N₄), (for the Lys8/Arg10 medium) with either dTAG^V-1 or DMSO, and cells were cultured for an additional 24 h under the same conditions. In parallel, a matched set of cells was maintained in Lys0/Arg0 medium under the same condition (dTAG^V-1 or DMSO). After treatment, cells were collected by snap freezing and SILAC sample preparation and mass spectrometry were carried out as described⁴³. SILAC ratios of Lys8/Arg10 versus Lys0/Arg0 peptides were calculated for each sample. To calculate differential protein expression, the DEP package⁴⁴ was used to analyse mean LFQ intensities differences of Lys8/Arg10-labelled peptides.

Statistics and reproducibility

In addition to the built-in statistical tests provided by software packages such as Zen Blue, DESeq2 and DEP, further statistical analyses were performed using two-sided t-tests or Welch’s t-test when group sizes differed substantially. Pearson correlation coefficients were used to assess the correlation between LFCs following ZFC3H1, LENG8 or GANP depletion.

Box plots show the median (centre line), interquartile range (box limits), and whiskers represent distribution of most extreme data points within 1.5× the interquartile range; P values of statistical tests are indicated directly on the plots. Outlier dots were excluded from the visual display for clarity but were included in the statistical analysis.

All real-time qPCR assays, RNA-seq, IP–MS and SILAC whole-cell proteomics were performed using three independent biological replicates, each comprising multiple technical measurements. Immunofluorescence staining (at least 100 cells were analysed per condition) and RNA-binding assays were repeated two times using independent batches of cells and all attempts of replication were successful. iCLIP libraries were prepared using two biological replicates. All other experiments, except cryo-EM data collection and processing, were performed at least three times with similar results and all attempts of replication were successful.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.