Friday, August 1, 2025
No menu items!
HomeNatureRespiratory viral infections awaken metastatic breast cancer cells in lungs

Respiratory viral infections awaken metastatic breast cancer cells in lungs

Mouse strains, influenza infection and antibody treatments

Transgenic mouse models of breast cancer, using mouse mammary tumour virus (MMTV) long terminal repeats, are widely used. In brief, MMTV-PyMT and MMTV-erbB2/neu/Her2 (MMTV-Her2) mice express the oncogenes polyoma virus middle T antigen (PyMT) and rat Erbb2 (encoding HER2), respectively, upstream of the MMTV promoter, which confers expression in the mammary epithelium, as described elsewhere10,13,48. The MMTV-PyMT transgene is congenic in the FVB mouse background (a gift from William Muller). Given that the MMTV-PyMT mice exhibit substantial lung tumour burden within a few months of life, we limit our analyses to newly awakened DCCs in this model (forming micrometastases, defined as lesions with an area of less than 0.03 mm2). The MMTV-Her2 transgene is congenic in the FVB (Jackson Laboratory, 002376) and C57BL/6 (a gift from Ramon Parsons, congenic in C57BL/6J by backcrossing from the FVB background26) backgrounds. MMTV-Her2 mice (FVB) were crossed with IL-6-knockout (KO) mice9,25. For an orthotopic model of breast cancer, EO771 breast cancer cells49 (a gift from Diana Cittelly) were injected into the fourth right and left mammary fat pads with 2 × 105 or 1 × 106 cells per fat pad.

Eight-week-old MMTV-PyMT and 12- to 14-week-old MMTV-Her2 female mice were infected with 500 EIU Puerto Rico A/PR/8/34 H1N1 IAV through intranasal administration in 50 μl PBS. For viral administration, mice were anaesthetized using 5% induction isoflurane and 2% maintenance, performed with a SomnoFlo Low-Flow electronic vaporizer machine in an induction chamber. After ensuring adequate anaesthesia with slow and deep breathing, droplets of viral fluid were placed on the mouse’s nostrils. The mouse inhaled the fluid through the nostrils. Once the fluid had been inhaled, the mouse was placed on a heating pad to recover.

For immune-cell depletion experiments, mice were injected intraperitoneally with rat IgG as a control (MP Biochemicals, MPBio 0855951), 100 μg anti-CD4 (Bio X cell, clone GK1.5, BP003-1), or 100 μg anti-CD8 (Bio X cell, clone2.43) 1 day before IAV infection and every 6 days afterwards, or 200 μg anti-Ly6G (Bio X cell, clone 1A8, BP0075-1) on the day of the influenza virus infection, then 24 h and every other day afterwards, until being euthanized. For 5-Ethynyl-2′-deoxyuridine (EdU) incorporation, mice were injected with 50 mg per kg EdU (Sigma Aldrich, BCK488-IV-FC-S) 4 h before euthanasia.

For both MMTV-PyMT and MMTV-Her2 mice, as a humane end point, mice were euthanized when the tumour reached 20 mm in any one dimension, tumours were ulcerated or infected, or if there was a major sign of discomfort, as determined by the institutional veterinarian. Mice were monitored every other day during the first week or until the tumour was palpated, and daily afterwards until the mice needed to be euthanized. Veterinary technicians in the institutional facility monitored the mice daily.

All mice were co-housed in specific pathogen-free animal facilities, maintained at 21 °C (±1 °C) and 35% humidity with a 14 h:10 h light:dark cycle (light 06:00–20:00). All the mice were backcrossed in the C57Bl/6J background for more than 10–12 generations. Only female mice were used for the studies. The average age of the mice was 12–24 weeks. An approved measure of CO2 followed by cervical dislocation was used for euthanasia.

The University of Colorado Institutional Animal Care and Use Committee (IACUC) reviewed and approved all animal experiments (including humane end points described above), which were conducted in accordance with the NIH Guidelines for the Care and Use of Laboratory Animals.

SARS-CoV-2 MA10 propagation

Mouse-adapted SARS-CoV-2 MA10 (BEI Resources, NR-55329) was propagated in Vero E6 cells (ATCC CRL-1586) as previously described42. In brief, low-passage Vero E6 monolayers were inoculated at a multiplicity of infection of 0.01 with SARS-CoV-2 MA10. When Vero E6 monolayers exhibited 70–75% cytopathic effect (2–3 dpi), supernatants were collected, clarified by centrifugation, supplemented with an additional 10% FBS, aliquoted and stored at −80 °C. SARS-CoV-2 titres were determined by plaque assay on Vero-E6 cells. Vero-E6 cells were maintained at 37 °C in Dulbecco’s Modified Eagle medium (DMEM, HyClone 11965-084) supplemented with 10% fetal bovine serum (FBS), 10 mM HEPES (pH 7.3) and 100 U ml−1 of penicillin-streptomycin.

SARS-CoV-2 MA10 infection of mice

MMTV-Her2 female mice (in both C57BL/6J and FVB backgrounds) at 14–19 weeks of age were anaesthetized by intraperitoneal injection of a mixture of ketamine (80 mg per kg) and xylazine (7.5 mg per kg) in a volume of 100–200 μl. Fully anaesthetized mice were inoculated intranasally with 104 PFU of SARS-CoV-2 MA10 diluted in PBS supplemented with 1% bovine calf serum by administration of 25 μl of inoculum in each nostril for a total volume of 50 μl. Mouse weights were collected daily for 15 days, and mice inoculated with SARS-CoV-2 MA10 exhibited weight loss beginning at 2 dpi, with greatest loss achieved at 3-4 dpi, as previously reported42. As controls, MMTV-Her2 mice were mock inoculated with 50 μl of PBS/1% bovine calf serum.

SARS-CoV-2 MA10 viral titre from lungs

MA10 viral titre was determined as previously described50. Lung superior lobes were homogenized, serially diluted in DMEM with 2% FBS, HEPES, penicillin-streptomycin and incubated on Vero E6 cells for 1 h at 37 °C. Cells were then overlaid with 1% (w/v) methylcellulose in MEM with 2% FBS at 37 °C for 3 days. Overlays were removed afterwards, and the plates were fixed with 4% paraformaldehyde for 20 min at room temperature. Fixed plates were stained with crystal violet (0.05% w/v) in 20% methanol for 10 min. Infectious viral titres were determined by manually counting the plaques formed.

Immunohistochemistry and immunofluorescence staining

Lungs and mammary glands were collected and fixed in 10% neutral buffered formalin overnight, transferred to 70% ethanol the next day and then embedded in paraffin. Tissues were sectioned (5 μm) and used for immunohistochemistry (IHC) and immunofluorescence. Slides were deparaffinized in three incubations of 15 min in Histo-clear (Fisher Scientific, 50-899-90147) then descending 10-min ethanol incubations: three at 100%, followed by 95% and 70% followed by 10 min of H2O incubation. Heat-induced antigen retrieval was done for 10 min in a pressure cooker in citrate buffer (10 mM citric acid, pH 6.0). For IHC, samples were incubated in 1% H2O2 for 15 min to block endogenous peroxidase activity. Permeabilization was done using 0.1% normal goat serum in 0.4% Triton-X 100 in PBS for 30 min. Sections were blocked for 1 h at room temperature with blocking solution (Abcam, AB64226) containing MOM blocking reagent (Vector Laboratories, MKB2213-1), incubated with primary antibodies (Supplementary Information Table 1) at 4 °C overnight in antibody diluent (Abcam, 64211), then washed 3 times for 30 min each in 0.1% triton-X 100 in PBS. For IHC samples, sections were incubated in ImmPRESS HRP goat anti-rabbit or rat IgG polymer detection kit (Vector Laboratories, MP-7451/MP7404) and ImmPACT DAB substrate, peroxidase HRP (Vector Laboratories, SK4105) according to the manufacturer’s instructions. The IHC slides were mounted using micromount mounting medium (StatLab, MMC0126). For immunofluorescence, sections were incubated with secondary antibodies for 1 h at room temperature in antibody diluent (Abcam, 64211). Sections were then washed in 0.1% Triton-X 100 in PBS 3 times for 30 min each and were mounted using fluoroshield mounting media with DAPI (Abcam, 104139). Immunofluorescence images were collected using a Zeiss Axiovert 200-m fluorescence microscope. IHC images were collected using a Keyence BZ-X800 microscope. Section staining, image capturing and image analysis were done manually using ImageJ and were carried out by a researcher who was blinded to sample identities. Subsequent grouping and graphing were done by a different lab member who was unblinded after image analyses and quantification were completed.

Assessment of collagen deposition

Collagen deposition was assessed using Masson’s Trichrome stain. The intensity of the stained areas was assessed using FRIDA software as described elsewhere51.

BALF processing

Bronchoalveolar lavage was done using 1 ml PBS (ThermoFisher, 14190-144) after mice were euthanized. BALF was collected and centrifuged at 500g for 5 min at 4 °C. Supernatant was flash frozen in liquid nitrogen and stored at −80 °C until analysis. Red blood cells were lysed using haemolytic buffer (150 mM NH4Cl, 1 mM NaHCO3, 1.1 mM Na2EDTA) for 3 min, flow buffer (PBS with 2% FBS and 2 mM EDTA) was added and cell suspensions were centrifuged at 500g for 5 min at 4 °C. Cells were resuspended in flow buffer and counted manually.

Cytokine detections

Cytokines in the BALF were measured using custom-made high-sensitivity multiplex assays from Meso Scale Discovery according to the manufacturer’s instructions.

Flow-cytometric analyses

Cells recovered from BALF were stained with antibodies (Supplementary Information Table 1). Alternatively, whole lungs were taken and digested using a method described elsewhere52. In brief, lung digestion mix (1.5 mg ml−1 collagenase A (Sigma Aldrich, COLLA-RO), 0.4 mg ml−1 deoxyribonuclease I (Worthington, LS002139), 10 mM HEPES pH 7.2, 5% FBS) was injected into the lungs through cannulae and lungs were incubated in a shaking incubator at 37 °C for 30 min followed by vigorous vortexing. Digested lungs were passed through a 50 μm cell strainer and red blood cells were lysed using haemolytic buffer for 3 min, flow buffer were added and cell suspensions were centrifuged at 500g for 5 min at 4 °C. Single cells were resuspended in flow buffer and stained with antibodies (Supplementary Information Table 1) for flow cytometry. For mitochondrial mass analysis, lung cell suspensions were stained with Mitotracker green (Invitrogen, M7514) for 30 min at 37 °C. Staining for CD4 and CD8 was performed for the last 5 min of the incubation at 37 °C, and cells were immediately washed for flow cytometry analysis. Data were collected on an LSR II flow cytometer (BD Biosciences) or Aurora (Cytek) and analysed using FlowJo software v.10. CD4 and CD8 cell populations were well defined (Extended Data Fig. 7h). For cell sorting of DCCs, lung cell homogenates were obtained from PBS or IAV (9 dpi)-infected MMTV-Her2 mice using a Lung Dissociation Kit Mouse according to the manufacturer’s protocol (Miltenyi, 130-095-927). The single-cell suspensions were treated for red blood cell lysis. Single-cell suspensions were pre-incubated (5 min) with anti-CD16/CD32 Fc-Block (BD Biosciences, 553141) followed by staining for CD45 and HER2 and sorting using an Astrios EQ flow cytometer (Beckman Coulter). DCCs were gated on CD45neg HER2+. Sorted DCCs were used for bulk RNA-seq (described below).

Ex vivo analysis of lung CD8+ and CD4+ cells

CD8+ cells were isolated from digested lungs using positive selection with CD8α (Ly-2) microbeads (Miltenyi, 130-117-044) according to the manufacturer’s protocol. For CD8+ cell-mediated cytotoxicity experiments, Her2 cells isolated from mammary glands of MMTV-Her2 mice and expanded in culture, or immortalized PyMT cells (MET-1) isolated from mammary glands of MMTV-PYMT mice, were plated 2 days before the killing assay. Lung CD8+ cells were isolated 15 days after IAV infection from wild-type mice, MMTV-Her2 mice or MMTV-Her2 mice treated with anti-CD4 antibodies (starting the day before infection). Lung CD8+ cells (pooled from 3–4 mice) were added to the cancer cell cultures at a 1:1 effector:target ratio. Then, 48 h later, co-cultures were washed (removal of CD8+ cells), trypsinized and live cancer cells were counted. Isolated lung CD8+ cells were restimulated using anti-CD3/anti-CD28 coated beads53, and 20 h later, supernatant was collected and used for detection of IFNγ by ELISA, as previously described54, using anti-mouse IFNγ capture antibody (Biolegend, 505702) and biotinylated anti-mouse IFNγ antibody (Biolegend, 505804).

CD4+ cells were isolated from IAV-infected wild-type and Her2 mice lung cell homogenate using positive selection with CD4 Microbeads (L3T4) (Miltenyi, 130-117-043) according to the manufacturer’s protocol. Cell pellets were used for whole-cell lysates for western blot analysis.

Western blot analysis

Whole-cell extracts were prepared from CD4+ cells isolated from the lungs of wild-type and MMTV-Her2 mice (FVB) infected with IAV, following methods described elsewhere55. For western blot analysis, the following antibodies were used: β-actin monoclonal antibody (AC-15) (Invitrogen, AM4302), anti-DUSP5 (Invitrogen, PA5-85961), anti-rabbit HRP (Jackson ImmunoResearch Laboratories, 111-035-144) and anti-mouse HRP (Jackson ImmunoResearch Laboratories, 115-035-166).

Fixed single-cell RNA-seq

Single cells were generated as described in the section Flow-cytometric analyses. Cells exhibiting greater than 80% viability were fixed in a 4% formaldehyde solution using the Chromium Next GEM Single Cell Fixed RNA Sample Preparation Kit (10X Genomics). The whole-transcriptome probe pairs (10X Genomics) were added to the fixed single-cell suspensions to hybridize to their complementary target RNA during an overnight incubation at 42 °C. After hybridization, unbound probes were removed by washing. The fixed and probe-hybridized single-cell suspensions were loaded onto a Chromium X (10X Genomics) microfluidics instrument to generate partitioned nanolitre-scale droplets in oil emulsion. The target was for each droplet to contain a barcoded gel bead, a single cell and enzyme Master Mix (10X Genomics) for probe pair ligation and gel bead primer barcode extension. The droplets in oil emulsion were placed in a thermal cycler for 60 min at 25 °C, 45 min at 60 °C and 20 min at 80 °C. The single-cell barcoded, ligated probe products underwent library preparation using standard 10X Genomics protocols in preparation for Illumina next-generation sequencing. The gene expression library derived from single-cell barcoded, ligated probe product were sequenced as paired-end 150-base pair reads on an Illumina NovaSeq 6000 (Illumina) at the University of Colorado Genomics Shared Resource at a target depth of 20,000 reads per cell for all samples.

Data processing for scRNA-seq analysis

The scRNA-seq fastq files were processed using Cell Ranger software (v.7.1.0, 10X Genomics)56 to assign reads to genes based on Cell Ranger’s Chromium mouse transcriptome probe set (v.1.0.1). The counts were analysed using the Seurat R package57. Genes found in fewer than 10 cells were excluded. Cells were excluded if they contained fewer than 201 genes, more than 7,500 unique molecular identifiers (UMIs) or greater than 2.5% of mitochondrial UMIs. The R package scDblFinder58 was used to identify and subsequently remove doublets from the data. As well as removing cells identified as doublets, preliminary clustering was used in sequential fashion to remove clusters with greater than 50% of cells being identified as doublets. After downstream processing, clusters were filtered if they contained canonical markers from multiple cell types. The data were then depth-normalized followed by natural-log transformation. The top 2,000 most variable genes were used to scale the data while regressing out cell cycle S/G2M difference, total UMI and percentage of mitochondrial UMIs.

Principal component analysis was performed using the top 2,000 variable genes. Principal components (n = 30) that captured most of the variation were then included in further data-processing steps. Clusters were identified (at a resolution of 1.5) using the K-nearest neighbours algorithm. Clusters were annotated to cell types using enriched canonical markers and ORA59 with gene sets from the MSigDB60 and the PanglaoDB61. Broad T lymphocytes were identified and subclustered separately to increase cell-type resolution. Differentially expressed genes were identified using the Wilcoxon rank sum test within each of the cell types identified for the indicated comparisons. GSEA was done using the clusterProfiler R package (v.4.0.5)62 and the Benjamini–Hochberg method was used to calculate the adjusted P values. ORA62 was performed on the top 200 differentially expressed genes using the Hallmark, KEGG and GO Biological Processes gene set collections of the MSigDB60. Plots were produced using the Seurat57, ggplot263, ggpubr64 and pheatmap65 R packages.

Mitochondrial-specific scRNA-seq analysis

We analysed the log2(fold change), adjusted P values and raw P values generated from scRNA-seq data to compare the following experimental groups: HER2 + IAV versus HER2 + PBS, HER2 + IAV versus wild type + IAV, and HER2 + IAV +anti-CD4 versus HER2 + IAV. To focus on mitochondrial functions, we used our custom mitochondrial pathway gene lists, originally published in ref. 66. Specifically, we examined overlaps between mitochondrial OXPHOS genes and our curated innate immune pathways associated with mitochondrial activity. The results were visualized as heatmaps using the pheatmap package (v.1.0.12). Pathway analysis was done using fast GSEA67 with custom gene-set files previously curated in ref. 66. All samples were compared with controls, and the ranked list of genes was defined using the −log10(P value) × log2(fold change). Statistical significance was assessed through 1,000 permutations of the gene sets60. Results are reported with a false discovery rate (FDR) threshold of less than 0.25 and visualized as heatmaps generated with the pheatmap package (v.1.0.12).

RNA-seq analysis of DCCs

RNA was isolated from sorted DCCs (flow sorting described above) with the RNeasy plus micro kit (Qiagen, 74034) and libraries were prepared using the SMART-Seq mRNA LP kit (Takara Bio, 634762) following the manufacturer’s instructions. Pooled libraries were sequenced on the NovaSeq X (Illumina). The fastq files were processed using the nf-core rnaseq pipeline (v.3.12.0)68. Reads were trimmed with Cutadapt69 and aligned to the mouse transcriptome (GRCm38, Ensembl release 102) using STAR (v.2.7.9a)70 and quantified using Salmon (v.1.10.1)71. Differential expression analysis was done using limma72 with the voom method followed by GSEA as described above.

Influenza virus RNA quantification

Whole lung tissue was homogenized and RNA was isolated using TRIzol/chloroform extraction following the manufacturer’s protocol (ThermoFisher and MilliporeSigma, respectively). RNA (1 μg) was reverse transcribed with an iScript cDNA synthesis kit (Bio-Rad Laboratories) and the viral load was determined by qPCR for the PR8 acid polymerase gene compared with a standard curve of known PR8 acid polymerase gene copy numbers as previously described73.

HER2+ mammospheres and EO771 organoid culture

FvB-MMTV-Her2/Neu female mice 14–18 weeks old were used as early (‘premalignant’) stage mice. Mice were euthanized using isoflurane and cervical dislocation. Whole mammary glands were minced and digested in 0.15% Collagenase 1A (Sigma, C-9891), 2.5% bovine serum albumin and 200 U DNAse I (Stemcell Technologies, NC9007308) solution at 37 °C with agitation for 30 min. Red blood cell lysis buffer (eBioscience, 4333-57) was used for 2 min at room temperature to remove blood cells. Cells were filtered through a 40-μm filter. Then 3 × 105 cells per well were seeded in six-well ultralow-adhesion plates in 1 ml mammosphere media (DMEM/F12 (Gibco, 11320-082), 1× B27 supplement (Gibco, 17504-044), 10 ng ml−1 EGF (Peprotech, AF-100-15-A), 50 U penicillin-streptomycin (Thermo Fisher, 15070-063)). An additional 1 ml of mammosphere medium was added 24 h after seeding. At day 4 after seeding, cells were treated with either PBS or 10 ng ml−1 IL-6 (R&D Systems, 406-ML-005) for 3 consecutive days. Using a Nikon Eclipse Ti-S microscope, mammospheres were imaged at 4× magnification with two images taken per well at the end of the treatment. The size and number of mammospheres were analysed using QuPath software.

EO771 cells were seeded in a poly-HEMA-coated 12-well low-adhesion culture dish at a density of 1.5 × 105 cells per well in 1 ml organoid medium (DMEM/F12, 5% FBS, 1% penicillin-streptomycin 5,000 U ml−1, 20 ng ml−1 FGF2, 10 ng ml−1 EGF, 5 μM Y-27632, 4 μg ml−1 heparin plus 5% Matrigel). Cells were treated with either PBS or 10 ng ml−1 IL-6 (R&D Systems, 406-ML-005) for 3 days. Using an EVOS M7000 microscope, EO771 organoids were imaged at 4× magnification with five images taken per well every other day. The size and number of organoids were analysed using FIJI (ImageJ).

Measuring transgenic Her2 mRNA in leukocyte-depleted peripheral blood

MMTV-Her2 mice 12–14 weeks old were infected with 500 EIU Puerto Rico A/PR/8/34 H1N1 IAV or PBS as described above and euthanized using CO2 at 9 dpi. Blood was collected by intracardiac puncture and placed in heparin solution on ice. Following red blood cell lysis, lineage depletion was performed using a Miltenyi Direct Lineage cell depletion kit, mouse (Miltenyi, 130-110-470) following the manufacturer’s instructions. After lineage depletion, RNA was extracted using an RNeasy Plus Micro Kit (Qiagen, 74034) following the manufacturer’s instructions. Quantitative PCR with reverse transcription (RT–qPCR) was done using the iTaq Universal SYBR Green One-Step RT–qPCR (Bio-Rad, 172-5150) with primers for the MMTV-Her2 rat transgene; forward, 5′-CCCGAGTGTCAGCCTCAAA-3′; reverse, 5′-GCAGGCTGCACACTGATCA-3′. The RT–qPCR was run on a Bio-Rad thermocycler (CFX Opus 384).

Quantification and statistical analyses (mouse models)

Statistical analyses were done using Prism 10.2.1 software (GraphPad). Investigators were not blinded to allocation during virus (IAV or SARS-CoV-2) inoculation or antibody treatment. Quantification and image analysis were done in a blinded manner; n indicates the number of mice per group. A minimum of three slides per mouse were used for image analysis. Total HER2+ cell counts (Figs. 1c and Fig. 2b), HER2+ cells and HER2+ Ki67+ cells were counted manually using ImageJ. Three lung sections at least 50 µm apart per mouse were counted and summed. We collected and analysed PBS groups at each time point; because no differences in DCC expansion or phenotype were observed at different time points, results for PBS samples were pooled. For other image quantifications, whole-lung images were divided into fields using the ImageJ grid function and 8–10 fields were selected at random per image and counted. For experiments with two groups, a two-tailed Student’s t-test was used; for experiments with more than one group, one-way ANOVA tests were used unless otherwise stated. Data were expressed as mean ± s.d. and P values ≤ 0.05 were interpreted as evidence against the null hypothesis (that is, no effect, no difference). Replicates represent different mice or different cultures, not repeated measures of the same sample. Graphs are presented as box and whiskers with dots representing individual values; the three lines represent the maximum (top line), median (middle line) and minimum (bottom line) values of the dataset.

Human observational data

We selected SARS-CoV-2 infections as the driver virus owing to the mandatory reporting of infections and COVID-19 disease during the early stages of the pandemic, allowing the use of real-world data to test the hypothesis that respiratory viral infections promote metastatic disease. Two complementary datasets from different regions of the world were analysed: the UK Biobank, which is a population-based study including 502,356 adult volunteers aged 40–69 years at recruitment from 2006 to 2010 (refs. 74,75), and the Flatiron Health electronic health record (EHR) database, which contains longitudinal data from about 280 US cancer clinics (around 800 sites of care) on patients with cancer and survivors76,77.

Population-based analyses of the UK Biobank

Study 1 was an analysis of UK Biobank data including lifestyle, anthropometric, medical history, SARS-CoV-2 testing and mortality data linked to national registries. Previous cancer diagnoses were obtained through consented linkage to the national cancer registry and SARS-CoV-2 test status through linkage to national registers. Mortality data were obtained from the national death registries (NHS Digital, NHS Central Register and National Records of Scotland). We considered all-cause mortality (including both primary and secondary causes), non-COVID-19 mortality (by excluding deaths with ICD codes U07.1 and U07.2 (ref. 78) or any death within one month of the latest recorded positive test result) and cancer mortality (considering cause of death with ICD codes listed in Extended Data Table 1).

To evaluate whether SARS-CoV-2 test positivity affected all-cause, non-COVID-19 or cancer mortality, we implemented a rigorous matching strategy. Cancer survivors with a primary cancer diagnosis at least five years before the start of the pandemic and a positive COVID-19 test result were matched to cancer survivors with negative test results with a similar risk profile.

Of the 502,356 UK biobank participants, we excluded two groups: first, those with missing information on sex, age, body mass index, ethnicity, smoking status, alcohol consumption, education, employment status, household income, self-reported comorbidities, date of SARS-CoV-2 testing when the primary cause of death was COVID-19 and cancer diagnosis date if the primary cause of death was cancer (n = 65,245); and second, participants without any SARS-CoV-2 PCR test record (n = 195,559) (Extended Data Fig. 12d).

This left 241,552 participants, of whom 48,958 had been diagnosed with cancer at the latest follow-up (18 December 2022). From this group, we excluded five groups: participants with inconsistent dates of death (n = 8); those diagnosed with multiple cancers (n = 4,421); participants with a primary cancer diagnosis after the start of the pandemic (defined as 1 January 2020; n = 7,650); those who tested positive for COVID-19 after the UK vaccination rollout (1 December 2020; n = 13,274); and participants with cancer diagnoses less than five years before the pandemic onset (n = 9,969); this was to ensure that participants were, in all likelihood, in full remission and thus any residual metastatic cancer cells were likely to be dormant.

After these exclusions, the final cohort included 13,636 participants, of whom 531 tested positive for SARS-CoV-2, and 13,105 who tested negative before the vaccination rollout (Extended Data Fig. 12d).

We used a non-parametric matching approach (without replacement)79 to identify (up to) ten negative-test participants for each positive-test participant. Matching was performed in two steps. We performed an exact matching based on cancer type and sex. Then, we matched for age, ethnicity, smoking status, alcohol consumption, education, employment status, household income and cancer diagnosis date (with a maximum allowable difference in cancer diagnosis of five years) using the nearest-neighbour method, an algorithm based on propensity score matching. The resulting matched population included 487 with positive tests (that is, we could not find good matches for 44 of those with positive tests) matched to 4,350 with negative tests.

Using test positivity as the predictor, we ran a series of unconditional logistic regression models for all-cause, non-COVID-19 and cancer mortality. Models were adjusted for all matching factors to account for potential residual confounding. We also repeated the analyses for patients with cancer diagnosed at least ten years before the start of the COVID-19 pandemic in the United Kingdom to further increase the likelihood that patients were in remission. This was achieved by excluding the positive-test participants who were diagnosed with cancer between 1 January 2010 and 31 December 2019 and re-running the matching procedure, resulting in 266 with positive tests and 2,228 matched individuals with negative tests.

Sensitivity analyses were conducted by varying censoring dates in six-month intervals from 1 June 2020 to 31 December 2022. Longer follow-up periods included more events, whereas shorter periods minimized potential bias from missing infection data and vaccination.

Flatiron Health EHR-based analyses

Data source

Study 2 used Flatiron Health’s nationwide EHR-derived database, including de-identified data from about 280 US cancer clinics (around 800 sites of care). The database is longitudinal, comprising de-identified patient-level structured and unstructured data, curated by technology-enabled abstraction76,77. Most patients in the database originate from community oncology settings, although the community and academic proportions may vary, based on study cohorts. The data were subject to obligations to prevent re-identification and protect patient confidentiality. Institutional Review Board approval of the protocol was obtained before the study was done and included an informed-consent waiver.

Included in our study were women aged at least 18 years old at the time of initial cancer diagnosis, and who had:

  1. i.

    early breast cancer; the cohort includes a probabilistic sample of patients with a diagnosis of stage I–III breast cancer on or after 1 January 2011, including those who presented with non-metastatic disease but who subsequently developed recurrent or progressive disease, with at least two visits occurring on or after January 1, 2011;

  2. ii.

    metastatic breast cancer; the cohort includes a probabilistic sample of patients diagnosed with stage IV breast cancer on or after 1 January 2011 and those who presented with earlier-stage breast cancer but who subsequently developed metastatic disease on or after 1 January 2011, and who had at least two clinic encounters evident in the database occurring on or after 1 January 2011; and

  3. iii.

    adult female patients aged 18 years or more at the initial diagnosis.

Real-world data source

The index date was defined as the date of the initial diagnosis of breast cancer. The COVID-19 status was defined as positive if any COVID-19 diagnosis (ICD codes B97.29, B97.21, J12.81, B34.2 and U07.1) was made after the index date and before the diagnosis of lung metastases or the last follow-up date. The data cut-off date was 31 August 2023. The start date of COVID-19 positivity status was the earliest COVID-19 diagnosis date. Baseline characteristics of gender, race, ethnicity and age at index date were obtained from structured data.

Analyses

Baseline characteristics were summarized using descriptive statistics. Cause-specific analysis was conducted (death was censored). Univariable and multivariable Cox proportional hazard models were used to evaluate the effect of COVID-19 diagnosis on the risk of metastasis to the lungs, in which COVID-19 diagnosis status was treated as a time-varying covariate. The multivariable model was adjusted for patient characteristics considered relevant, including age, race and ethnicity. There were 36,216 COVID-19-negative patients and 532 COVID-19-positive patients (all 532 COVID-19-positive patients were COVID-19 negative at the index date) included in the multivariate analysis (Extended Data Fig. 12e). The median follow-up, the corresponding interquartile range (IQR) and the total number of accumulated person-years, for all patients, were 4.36 years, 6.21 years and 277,788 person-years, respectively. The median follow-up, the corresponding IQR and the total number of accumulated person-years, for patients’ COVID-negative period, were 4.35 years, 6.21 years and 277,115 person-years, respectively. The median follow-up, the corresponding IQR and the total number of accumulated person-years, for patients’ COVID-positive period, were 0.98 years, 1.08 years and 673 person years, respectively. The unadjusted and adjusted hazard ratio with the corresponding two-sided 95% confidence interval was reported. The two-sided likelihood ratio tests were conducted. The significance level was 0.05. The time to metastases to the lungs was defined as the time from the index date to the date of metastases to the lungs. Patients without a date of pulmonary metastases were censored at the last confirmed activity date or death. Last confirmed activity was defined as the latest date of vitals record, medication administration or reported laboratory tests or results. We performed additional multivariate analyses (MVA) to control for additional potential confounding factors, including comorbidities and breast cancer subtypes as sensitivity analysis. Comorbidity scores using the Elixhauser comorbidity index were computed using ICD-9-CM or ICD-10 codes as previously reported80. The diagnosis codes were included if the diagnosis dates were on or within 365 days after the initial diagnosis date. Cancer subgroups were based on the most recent test results recorded in the Flatiron Health database and the subgroups were defined as follows:

Triple-negative

Evidence of an ER-negative, progesterone receptor-negative and HER2-negative test result, in which HER2-negative is defined as negative with the cancer type not otherwise specified (NOS), next-generation sequencing (NGS) negative (ERBB2 not amplified), fluorescence in situ hybridization (FISH) negative/not amplified, IHC negative (0–1+) or IHC equivocal (2+);

HER2+

Defined as one or more of the following: positive NOS, IHC positive (3+), FISH positive/amplified, NGS positive (ERBB2 amplification);

ER+

ER-positive and/or progesterone receptor-positive test result(s).

A stratified Cox proportional hazard model with stratification factors stage, year of diagnosis, age group and cancer subgroup was used to evaluate the effect of COVID-19 diagnosis on the risk of metastases to the lungs while adjusting important covariates (age, race, ethnicity and comorbidity) at initial diagnosis. There were 23,876 COVID-19-negative patients and 359 COVID-19-positive patients included in this multivariate analysis. The adjusted hazard ratio with the corresponding two-sided 95% confidence interval was reported. The assumption of proportionality was assessed using the method outlined in ref. 81, indicating that there was no statistically significant evidence suggesting a violation of the proportional hazard assumption.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

RELATED ARTICLES

Most Popular

Recent Comments