Two distinct host-specialized fungal species cause white-nose disease in bats

Sample and field data collection

Swab samples of P. destructans were collected from bat hibernacula. Sampling from hibernating bats was conducted without capture or handling by collecting samples while the bats remained freely hanging. The samples were collected by lightly swabbing the infected areas with a sterile dry swab (Polyester swab 164KS01, Copan). This method is considered minimally invasive or even noninvasive^45,46. The timing of sample collection was usually between January and April, when the highest numbers of bats with visible infection have been reported¹⁵. When no bats were present or sampling them was not possible, wall swabs were collected by touching the swab to hibernacula walls (ideally close to where bats usually hang to hibernate; see ref. ³⁴ for more details). Four isolates were also obtained from sediment samples collected from inside hibernacula and 22 were collected from caving gear (that is, caving suits and harnesses), which most probably originated from contact with hibernacula environments⁴⁴.

When a sample was taken from a bat or in close proximity (within about 10 cm), the bat species was also recorded. Temperature and relative humidity were measured in the hibernacula. Absolute humidity was then calculated from measures of relative humidity and temperature by applying a previously described formula⁴⁷.

Our work adhered to the ethical wildlife research guidelines of the American Society of Mammalogists for the use of wild mammals in research and education⁴⁸. Furthermore, this work was conducted under permission from the following authorities: Italy, Regional Speleological Federation of Emilia-Romagna (FSRER) and the Management Bodies of the Parks of Emilia-Romagna; Poland, Genarny Dyrektor Ochrony Środowiska (General Director for Environmental Protection) and Regional Directorate for Environmental Protection in Gorzów Wielkopolski (Regionalna Dyrekcja Ochrony Środowiska w Gorzowie Wielkopolskim); Switzerland, Kantonaler Fledermausschutz Aargau; Germany, Umweltamt, Veterinäramt, Untere Landschaftsbehörde Siegen-Wittgenstein, Untere Naturschutzbehörde Umweltamt Landkreis Harz and Referat Verbraucherschutz, Veterinärangelegenheiten Landesverwaltungsamt Sachsen-Anhalt, Untere Naturschutzbehörde des Landkreises Vorpommern-Greifswald, Regierung von Unterfranken, Regierung von Mittelfranken, Struktur- und Genehmigungs Direktion Nord/Süd, NLWKN Niedersächsischer Landesbetrieb für Wasserwirtschft, Küsten- und Naturschutz and Region Hannover–Fachbereich Umwelt; Austria, Department of Nature Conservation for Carinthia, Lower Austria, Upper Austria, Salzburg, Styria and Vorarlberg; Hungary, Pest Megyei Kormányhivatal, Országos Környezetvédelmi, Természetvédelmi és Hulladékgazdálkiodási Főosztály (Pest County Government Office, National Department of Environment Protection, Nature Conservation and Waste Management) and the Ministry of Environment and Water; Bulgaria, Bulgarian Ministry of Environment and Water; France, DDTM-Morbihan and DREAL; Republic of Latvia, Nature Conservation Agency; Belgium, Gouvernement Wallon; Denmark, The Nature Agency and Daugbjerg Kalkgruber; Romania, Speleological Heritage Commission; Estonia, Estonian Environmental Board; England, Natural England; Finland, Southwest Finland Centre for Economic Development, Transport and the Environment; Sweden, Uppsala djurförsöksetiska nämnd, Swedish board of Agriculture and the Swedish Environmental Protection Agency; Norway, Miljødirektoratet; Luxemburg, Ministère du Développement durable et des Infrastructures du Luxembourg; Croatia, Croatian Ministry of Environment and Nature; Russian Federation, Game Management Directorate of the Republic of Karelia, Institute of Plant and Animal Ecology and the Ural Division of the Russian Academy of Sciences; Slovak Republic, Ministry of the Environment of the Slovak Republic and the Department of State Administration for Nature and Landscape Protection; the Netherlands, Dutch Ministry of Economic affairs; Republic of Moldova, Government of Republic of Moldova–Ministry of Environment.

Laboratory materials and methods

Cultures and genotyping

Previously published DNA extraction and genotyping protocols were used^20,34, and are briefly outlined here. P. destructans was collected using sterile swabs from hibernating bats and the walls of sites where bats hibernate. The collected fungal material was cultured on dextrose peptone yeast agar⁴⁹ using classical mycological procedures and sterilization of tools between each use. After observing germination, typically 3–5 days after plating, plates were screened with a microscope to identify germinated single spores (identified as colonies expanding from a single germinating spore). Depending on availability, 1–3 (mean = 3.0, median = 3) and 1–5 (mean = 3.6, median = 2) single spores were typically isolated from bat and wall swabs, respectively. Isolation was performed by excising a plug with a sterile 3-mm biopsy punch and transferring it to a fresh 6-cm Petri dish. The plates were then visually monitored for 1 week to confirm that no additional spores germinated on the plug. Plates were grown at 10−15 °C until there was sufficient material to extract DNA (usually after several weeks to months). Each of these colonies is then referred to as an isolate or a culture. DNA extraction was done using a KingFisher Flex extraction robot (Thermo Scientific) with a MagMAX Plant DNA Isolation kit (Thermo Scientific). After DNA extraction, isolates were genotyped using 18 microsatellite markers and two mating-type markers in four multiplexes. The two mating-type markers were two independent primer sets used to amplify segments of the two mating types (MAT1-1 and MAT1-2) that are different in length and therefore diagnosable through fragment-length analysis²⁰. Genotyping was carried out on an ABI 3130 Genetic Analyser (Applied Biosystems), and GeneMapper software (v.5; Applied Biosystems) was used for fragment analysis.

DNA extraction for MinION and Illumina reads

Material was collected from P. destructans cultures using sterilized tweezers. We used a sorbitol wash buffer (100 mM Tris-HCl pH 8.0, 0.35 M sorbitol, 5 mM EDTA pH 8.0 and 1% (w/v) polyvinylpyrrolidone (PVP-40)) to clean the fungal material and to remove most of the culture medium from the hyphae (the wash was repeated twice with 5 min of incubation at room temperature each time). After removing the sorbitol wash buffer the second time (through centrifugation and removal of the liquid supernatant), 500 μl CTAB lysis buffer (preheated to 65 °C; 0.01 M Tris HCl pH 7.5, 25 mM EDTA pH 8.0, 1.5 M NaCl and 2% CTAB powder (w/v)), 30 μl proteinase K and 5 μl 1 M DTT were added for digestion and incubated overnight at 56 °C, mixing material after the first hour. After letting samples cool for 5 min at room temperature, 4 μl RNase A was added and left to incubate at room temperature for 10 min. One volume chloroform–isoamyl alcohol (24:1 v/v) was added, after which tubes were inverted 30 times and centrifuged for 5 min at maximum speed, keeping the supernatant. We then added a second step of proteinase K (30 μl) and RNase A (4 μl) treatment with an incubation for 30 min at 56 °C, as it was found to reduce the presence of RNA and result in better quality DNA. To remove these enzymes, we performed a second chloroform–isoamyl alcohol (24:1 v/v) extraction step by adding 1 volume, inverting 30 times and centrifuging at maximum speed for 5 min, after which the supernatant was kept. Precipitation of DNA was achieved with the use of 1/10 volume sodium acetate, 2 volumes ethanol (>99% purity) and centrifugation for 20 min at maximum speed. After gentle removal of the sodium acetate–ethanol mixture, the resulting pellet (containing the DNA) was washed twice with 70% ethanol. DNA was then eluted in ddH₂O and stored in the fridge. The DNA content was determined using Qubit (ThermoFisher).

Sample preparation for MinION nanopore sequencing

We performed long-read Oxford Nanopore Technology (ONT) sequencing of 12 isolates (5 Eurasian per clade, 1 North American and 1 outgroup; Supplementary Table 10) using MinION flowcells (FLO-MIN-106) using libraries prepared with an ONT Ligation Sequencing kit SQK-LSK109, following the manufacturer’s instructions. Statistics of the long-read sequencing and the associated assembled genomes are presented in Supplementary Tables 8 and 9.

Sample preparation for Illumina sequencing (individual isolates)

Illumina sequencing was performed for all the isolates for which we performed MinION long-read sequencing except for Gd1111, for which Illumina sequences were already available (Gd1111 = 20631-21, subculture of the type isolate). For all samples except Gd45 and Gd293, Illumina-indexed libraries were prepared for each isolate according to a previously described protocol⁵⁰ with modifications as proposed in a previous study⁵¹. Libraries were then sequenced (150 bp, paired-end) by Novogene on an Illumina NovaSeq 6000. For Gd45 and Gd293, libraries were prepared using TruSeq DNA PCR Free (350) and TrueSeq Nano DNA (350) kits, respectively, before being sequenced (150 bp, paired-end) by Macrogen on an Illumina HiSeq X.

Sample preparation for Illumina sequencing (Pool-seq)

We pooled DNA from multiple isolates (details on their origin presented in Supplementary Table 1) in equal concentrations into sample pools for sequencing. A total of four pools were prepared per clade with each isolate appearing in one pool only. For Pd-1, a total of 69 isolates were used (pool sizes: 17, 17, 17 and 18 isolates), whereas for Pd-2, a total of 63 isolates was used (pool sizes: 15, 16, 16 and 16 isolates). Within clades, isolates were assigned to a pool on the basis of the DNA concentration of their extracts (that is, the 17 and 15 isolates with highest DNA concentration for Pd-1 and Pd-2, respectively, were pooled together). The strategy of pooling samples into four pools per clade was used to validate the consistency of the results generated by each pool individually and the combined dataset (see the section ‘Genotyping’). Isolates were chosen to maximize both the geographical distance among sites and the genotypic richness within each clade. After DNA extraction (and quantification) of each isolate, DNA was combined to result in equal concentrations of isolates with a total of 500 ng DNA in a volume of 60 μl per pool. Illumina-indexed libraries were prepared for each pool (that is, isolates were not individually indexed for Pool-seq) according to a published protocol⁵⁰ with modifications as previously proposed⁵¹. Libraries were then sequenced (150 bp, paired-end) by Novogene on an Illumina NovaSeq 6000.

Analyses of multilocus genotypes

The analyses of multilocus genotypes (MLGs) were run in R (v.4.1.1)⁵², except for estimated effective migration surfaces (EEMS), using packages for specific analyses. Specifically, the package poppr (v.2.9.3)⁵³ was extensively used as it provides the tools needed for population genetic analyses of haploid species with clonal reproduction (such as P. destructans).

MLGs were defined by their unique combination of alleles across the 18 polymorphic microsatellite loci. This set of markers is sufficient to reliably identify the identity of MLGs both among and even within sites, for which MLGs are usually less differentiated³⁴. Across all isolates, the allelic richness was high, ranging from 10 to 93 alleles per locus (mean = 37); however, we found that some alleles were fixed in the Pd-2 clade (Supplementary Table 6). Only isolates with a maximum of 4 missing alleles (that is, successfully genotyped at 14 microsatellites or more) were used for analyses, which resulted in a dataset comprising 5,479 isolates.

Principal component analysis (PCA) was used to visualize the differentiation among isolates (package adegenet (v.2.1.5)⁵⁴). As the outcome of PCAs depends on sampling intensity⁵⁵, it was important to select roughly equal sample sizes among clades to capture their differentiation in Eurasia. To achieve this, we chose 51 sites (the same number of sites in which Pd-2 was found) from Pd-1 in a way that maximized geographical distance among them (that is, thinning sites) and used up to 20 isolates per site (again, to ensure that sampling among sites was not markedly uneven). This resulted in a dataset containing 234 and 92 isolates for Pd-1 and Pd-2, respectively, over 51 sites each (using unique MLGs only). PCA was then performed using this Pd-1 dataset subset and the full dataset for Pd-2. The results revealed two clusters, which were completely differentiated. The position of the North American isolates on the PCA plot was simply determined a posteriori by projecting or predicting their coordinates using the function ‘suprow’. Considering that maximizing distance among sites may also have an influence, we confirmed these findings on the Eurasian dataset by randomly subsampling 51 sites of Pd-1 repeatedly (100,000 times, but with the number of isolates per site still capped at 20) without considering the geographical distance between chosen sites. After running these 100,000 subsampled PCAs (with Pd-2 unchanged, the geographically thinned PCA run was added, which resulted in a total of 100,001 PCAs), the density of values observed for PC1 was consistent with the values obtained for the thinned dataset, which indicated that the signal of differentiation between Pd-1 and Pd-2 was strong and independent of geography or identity of the chosen sites (Fig. 1b).

We also used the microsatellite dataset to investigate the presence of population differentiation in each of the discovered clades. For this purpose, we investigated only Eurasian isolates (that is, excluding the 33 isolates from the United States) and treated Pd-1 and Pd-2 separately.

First, we used DAPC³³ (using the package adegenet) to assign each isolate to one site among all European sites sampled. If populations from different sites are genetically differentiated, one would expect the DAPC to assign isolates to their true site of origin (that is, where they were sampled) more often than expected by chance. Here each isolate was probabilistically assigned to sites based on the observed allele frequencies (no assumptions were made, for example, about the independence of loci). Each isolate was run in an independent DAPC, which resulted in a set of 2,191 and 279 DAPCs for Pd-1 and Pd-2, respectively (excluding all isolates from North America and with a limited number of 20 isolates per site, see also Supplementary Table 1; 120 PCA axes and 100 discriminant analysis (D_A) axes retained in all runs). As some isolates will always be correctly assigned by chance, it was important to quantify the percentage of correct assignments by chance compared with correct assignments based on observed allele frequencies. Hence, we ran the same sets of DAPCs after randomizing the site names (independently for each run) to ascertain the frequency of correct assignments occurring by chance and the expected distances between the isolates’ site of origin and their assigned sites if assignment was no better than random (randomized DAPC). The distances between assigned sites and sites of origin are presented in the main article for Pd-1 (Fig. 3b) and in Extended Data Fig. 9 for Pd-2.

To identify the European sites of origin of the North American introduction, we performed two separate analyses with two different methods: multivariate analyses (DAPC³³) and spatial Bayesian inferences (SPASIBA⁴⁰). DAPC analyses assign samples to sites already contained in the dataset, whereas the SPASIBA method can perform continuous assignment to any location (that is, coordinates) in the range defined by the user. First, we built a DAPC with all 5,162 Pd-1 isolates from 226 sites in Europe (the site in the Ural Mountains was excluded). The aim of this DAPC was to differentiate sites; hence, sites were used as pre-defined groups. We then used the ‘predict.dapc’ function from the adegenet R package to predict site memberships of the 33 isolates from North America. Running the DAPC while limiting the number of isolates per site to 20 provided identical assignment results. Second, we used the SPASIBA model, a spatial Bayesian inference (SPASIBA) method for geospatial assignment that models the spatial frequency of alleles by a set of spatially autocorrelated random variables with Gaussian distribution. Implementation was carried out in R using the SPASIBA (v.24.6.27) and INLA (v.0.0.4) packages based on the same dataset as described above for the DAPC analysis. That is 5,162 Pd-1 isolates from 226 sites in Europe for the reference data (‘geno.ref’) and 33 isolates from 9 sites in North America as individuals of unknown geographical origin to be assigned (‘geno.unknown’). For SPASIBA, we used the function ‘SPASIBA.inf’ with a ploidy level of 1 and a flat domain (sphere = false). To achieve a uniform spatial resolution, we configured the grid to consist of 198 pixels in longitude and 120 pixels in latitude. This setup ensured a consistent resolution across both dimensions, with each pixel representing an approximate resolution of 0.169°. To identify the most probable source of introduction inferred across the 33 North American isolates, we calculated the average assignment likelihood for each pixel across the 33 isolates (Supplementary Table 5), from which we computed the log-likelihood for representation (Fig. 4).

The visualization of EEMS (https://github.com/dipetkov/eems) was used to evaluate geographical barriers linked to patterns of gene flow³⁵. This method differs from PCA in that genetic differentiation is visualized as a function of migration rates rather than through genetic or genotypic distance. This method uses a population genetic model to compare expected pairwise genetic dissimilarities in relation to their geographical distances (that is, under a model of isolation-by-distance) with observed patterns across the sampled area. Specifically, a triangular grid with specific density (number of demes) is built over the area containing geo-referenced samples. For the edge of each grid, the migration parameter is estimated by Bayesian inference and Markov chain Monte Carlo sampling, which means that migration is estimated in an approximated stepping-stone model between neighbouring grid cells. As sampling locations will fall into the same or neighbouring cells depending on the grid cell size, the number of demes (defining the overall density of grid cells and hence their size) influences the outcome of estimated migration rates (particularly at small geographical scales). For this reason, we calculated the EEMS for a range of deme sizes (n = 8, 100–450 demes) in independent runs of 8 million iterations each (after a burn-in of 1 million iterations), as previously recommended³⁵. Runs were combined in a single figure for visualization of robust migration rates using the R package reemsplot2 (v.0.1.0)³⁵. It should be noted that estimated migration rates are most accurate closer to sampling locations and less accurate in sparsely sampled geographical areas. For this reason, in addition to clone-correction based on site identity (each genotype appears only once per site, additional occurrences are removed), the Russian isolates were excluded before calculating EEMS for both clades. This resulted in a total of 2,261 isolates used for Pd-1 and 107 isolates for Pd-2 (Supplementary Table 1). Markers Pd10 and Pd14 were uninformative for Pd-2 and hence removed for the EEMS (removed for the Pd-2 dataset only), which left a dataset with 16 markers.

Phylogenetic relationships between the 5,479 isolates (or 1,866 MLGs) were reconstructed using Nei’s ‘Da’ genetic distance and Cavalli–Sforza and Edwards Chord distance ‘Dch’, as both distance measures were found to be the best performing ones to retrieve the relationships between individuals⁵⁶. Genetic distances were then clustered using the UPGMA algorithm as implemented in Phangorn (v.2.11.1)⁵⁷ in R. Given that the topology for the nodes of interest was identical for both methods (data not shown), only the results from the Da distance are presented. Analyses were performed in R using functions implemented in hierfstat (v.0.5.11)⁵⁸. For visualization purposes of Fig. 1b, the ‘fish eye’ function of FigTree (v.1.4.4.)⁵⁹ was used to zoom into the section of the tree containing the Pd-2 clade. We jackknifed loci one at a time to test for the support of both Pd-1 and Pd-2 clades monophyly using the ‘is.monophyletic’ function in ape (v.5.7.1)⁶⁰.

Analyses of genomes de novo assembled from long-reads

Base calling

Base calling of fast5 files (from MinION sequencing) was performed on a GPU computer (hosted by the Montpellier Bioinformatics Biodiversity platform) by running the software Guppy (v.5.0.7), a state-of-the-art neural network base caller⁶¹.

Genome assembly

To assemble high-quality haploid genomes of P. destructans (P. destructans is haploid), we carried out adaptor trimming with Porechop on all base-called reads (https://github.com/rrwick/Porechop)⁶². These were then parsed into Flye (v.2.9)²⁵ with the –nano-hq flag. Other arguments were left as default (including automatic minimum overlaps). Genomes were polished once using pre-trimmed Illumina reads with HyPo (v.1.0.3) after initial mapping using paired-end mapper Burrows–Wheeler aligner—Maximal Exact Match, bwa-mem (v.0.7.17-r1188)^63,64. HyPo arguments included approximate genome length of 35 megabases (-s 35m; based on a previous study⁶⁵) and the average read depth of each genome (-c) (calculated with Samtools depth⁶⁶).

To remove contigs that potentially resulted from contamination, contigs with exceptionally low GC content (identified using infoseq EMBOSS (v.6.6.0.0)⁶⁷) were individually compared with the nucleotide BLAST database^68,69. On the basis of these results, we removed contigs not originating from P. destructans from four isolates (which contained clear contamination by Cellulosimicrobium cellulans, Penicillium solitum, Pyrenophora teres f. teres and Shiraia bambusicola).

Mitochondrial contigs, with a characteristic lower GC content and known length (about 32 kb), were also identified through BLAST and removed from all further analyses. To remove noise in our genomes resulting from spurious assembly, all contigs below 10,000 bp were removed using SeqKit (v.0.16.1)⁷⁰ (-m 10000). Statistics of the assembled genomes are presented in Supplementary Table 11.

Repeat annotation

We annotated repeat content of each genome with RepeatModeler and RepeatMasker tools. Build Database was followed by RepeatModeler (v.2.0.1), which we ran individually on all 11 genomes of P. destructans isolates to identify repeat regions with default parameters. All novel consensus repeat sequences were combined using CD-HIT-est (v.4.8.1) to remove redundancy in the clustered library⁷¹ (-aS 1 -c 1 -r 1 -g 1 -p 0). All final repeat sequences not annotated were then removed from the repeat library. Furthermore, BLASTN (v.2.9.0+) searches of the repeat library were carried out against the 9,405 annotated protein-coding genes of the P. destructans reference genome assembly downloaded from the NCBI (GCF_001641265.1_ASM164126v1). Any repeats with a sequence identity over 80% for over 80% of the query length (-outfmt 6, -perc_identity 80, manual removal of qcovs > 80) were also removed⁷². Finally, we removed duplicated fasta entries with Samtools (faidx). Masked assemblies as used downstream were produced using RepeatMasker (v.4.1.2)⁷³ (-xsmall) with the curated repeat library (-pa 5 -a -s -gff -no _is). The same pipeline was then used for the outgroup (Gd267), which was treated independently from P. destructans isolates.

Genome annotation

For gene annotation of all isolates, we chose to use the Funannotate pipeline (v.1.8.15) dedicated specifically for fungal genome annotation⁷⁴. We used the pipeline through the Galaxy Europe cluster (https://usegalaxy.eu). We started by soft masking all the genome assemblies using RepeatMasker tools (v.4.1.2; https://www.repeatmasker.org/RepeatMasker/)⁷³ with the transposable element (TE) library generated in this study (see the section ‘Repeat annotation’). We then used Illumina paired-end RNA sequencing data from European (Sequence Read Archive accessions SRP041673 and SRR1270711) and North American isolates⁷⁵ (SRP041668, SRR1270148, SRR1270408 and SRR1270412) in addition to the data from P. destructans-infected bats⁷⁶ (SRP055976) used for annotation of the published reference sequence⁷⁴. All RNA sequence paired-end reads were then mapped on the soft-masked genome assemblies using the RNA STAR mapping tool (v.2.7.10b)⁷⁷ (–genomeSAindexNbases 11bp). Funannotate-predict returns gene models based on read mapping using Augustus (v.3.4.0). It also uses curated databases (UniProtKb/SwissProt databank) for proteins to help predict probable gene structures. For the initial training of predictors, Funannotate-predict also uses BUSCO (v.5.2.2)⁷⁸ for initial Augustus species training. For this step, we used the phylogenetically closest species available on the Galaxy server: Fusarium (orthoDB v.10). We did not use the ab initio predictor dedicated to fungal genomes as the option created fragmented gene models. The results of the Funannotate-predict were combined to generate functional genome annotations using Funannotate-functional. Protein evidence generated was compared with the Funannotate database (v.2022-01-17-193541).

BUSCO genes

Genome assembly for each isolate was benchmarked with BUSCO (v.5.2.2) (hmmsearch v.3.1 and metaeuk v.5.34c21f2) using the option -m genome flag for the Kingdom fungi odb10 database from orthoDB (v.10). The fungi database contains 758 orthologous gene sequences^79,80. Basic statistics of reads were obtained with NanoStat and NanoPlot (v.1.42.0)⁸¹.

Sequence divergence

Sequence divergence for BUSCO genes was calculated from the MAFFT alignment (described below) in R, with the function ‘dist.dna’ from the ape package⁶⁰. For each of the assemblies of full genomes, subcontigs were obtained by deleting the repeated and low-complexity sequences detected using RepeatModeler and RepeatMasker pipelines (as described above). A local alignment of the subcontigs was carried out with NUCmer4 (v.4.0.0)⁸² for all the isolates against each other and interpreted using the show-coords program by applying a minimum percentage of sequence identity of 80%. The aligned regions were used to calculate the weighted average identity.

Synteny

To gain better insight into the similarities and differences in genomic structure and organization between clades, we performed network-based microsynteny analysis. This enabled us to investigate gene-copy number, to identify synteny conservation, to detect and quantify clade-specific syntenic clusters and to reconstruct phylogenetic relationships between isolates based on microsynteny. The detection of syntenic blocks performs best when using high-contiguity genomes assembled de novo (that is, without a reference). According to a previously published systematic evaluation⁸³, the characteristics of our genomes (mean N50 of 1.8 Mb; gene density estimated around 270 per Mb: 10,000 genes and 36.9 Mb per genome on average; Supplementary Table 11) should allow for a robust synteny analysis.

Using annotated genes from the 11 P. destructans genomes (detailed in the section ‘Genome assembly’), all inter-pairwise and intra-pairwise all-vs-all protein similarity searches were conducted using DIAMOND (v.2.1.7), for which the top 5 hits were kept in searches⁸⁴. The output generated by DIAMOND was used as input for the MCScanX algorithm⁸⁵ to perform pairwise synteny blocks detection. To verify that parameters used for MCScanX had no impact on our results, we performed microsynteny block detection (and all downstream analyses) under 25 different parameter settings (Supplementary Table 17). These parameter settings involved the two key parameters of MCScanX, namely, the minimum required number of genes (anchors) to call a syntenic block (-s, MACH_SIZE: 10, 15, 20, 25, 30) and the number of upstream and downstream genes to search for anchors (-m, MAX_GAPS: 15, 20, 25, 30, 35). All the syntenic genes identified in the syntenic blocks were used to build a microsynteny network using the Infomap algorithm (implemented in the infer_syntenet function in the R package Syntenet (v.1.8.1)⁸⁶, which has been demonstrated to be the best clustering method available for synteny networks⁸⁷. In the synteny network, genes represent nodes, and syntenic relationships between genes are represented by edges connecting the nodes. A median average clustering coefficient of 0.95 was obtained across parameter settings (min–max: 0.94–0.95%; Supplementary Table 17), which revealed that genes tend to form a complete subgraph or cluster with their syntenic neighbours across multiple genomes. Phylogenomic profiling was then performed using the inferred syntenic clusters, which resulted in a matrix m_ij of the phylogenomic profiles, which represented the number of genes from cluster j that can be found in genome i. To reveal synteny clusters that were conserved and clade-specific clusters, clusters were visualized as a heatmap with clusters clustered using Ward’s clustering on a matrix of Euclidean distance (Fig. 2e). The phylogenetic signal present in the synteny network was used to infer a microsynteny-based phylogeny of the 11 genomes. The binarized matrix of phylogenomic profiles was used in IQTREE2 (v.2.0.6) applying the MK + FO + R model with node support evaluated by two methods: 1,000 bootstrap replicates and 1,000 replicates for the SH-like approximate likelihood ratio test. The mid-point method was applied to root the tree using the ‘midpoint’ function in the ‘phangorn’ package⁵⁷ and the monophyly for the clades Pd-1 and Pd-2 was evaluated using the ‘AssessMonophyly’ function in ‘MonoPhy’ (v.1.3.2)⁸⁸.

Analyses of Illumina reads (individually tagged isolates)

Data checking and mapping

Illumina sequences were available for 18 isolates (listed in Supplementary Table 10). Fastp (v.0.23.4) was used to remove bases with a phred quality value lower than 30 (‘-q 30’). Subsequently, bwa-mem2 (v.2.2.1) was used to align the Illumina reads to each of the two reference genomes (Gd293 for Pd-1 and Gd45 for Pd-2). Samtools view (v.1.16.1) was used to only keep reads with mapped (-F 0×4), properly paired (‘-f 0×2’), and high-confidence mapping quality (MAPQ values of 60: ‘-q 60’). Samtools sort and Samtools index were used to respectively sort and index the obtained BAM files. Reads were assigned using Picard (v.2.27.1) and the AddOrReplaceReadGroups tool. The SortSam tool (from Picard) was used to sort the BAM file by queryname (QNAME) before removing duplicate reads using the MarkDuplicates tool (from Picard). SortSam was subsequently used to sort the BAM file by coordinate before indexing using Samtools index.

Base calling, genotyping and filtering

We then performed base calling with ‘gatk HaplotypeCaller’ (gatk v.4.2.6.1) using the ‘-ERC BP_RESOLUTION’ mode and a ploidy of 1. Genotyping was done with ‘gatk GenotypeGVCFs’. ‘gatk SelectVariants’ was then used to extract the non-variable positions in one file and the variable positions (SNP only) in another file (insertions and deletions were discarded). Hard filtering of variable positions that did not meet the criteria ‘QUAL < 30.0’, ‘QUAL/DP < 2.0’, ‘SOR > 3.0’, ‘FS > 60.0’, ‘DP < 20.0’ and ‘MQ < 40.0’ was carried out using ‘gatk VariantFiltration’ and ‘gatk SelectVariants’. Isolate genotypes with a read depth lower than 20× at a given position were set to no-call. ‘gatk SortVcf’ was then used to merge the filtered variable and non-variable positions into a single VCF file. ‘bedtools subtract’ was subsequently used to remove masked positions from the reference genome and positions called as heterozygous when considering the isolates as diploid (see the section ‘Filtering repetitive DNA’).

For each contig of each of the two reference genomes (Gd293 and Gd45), we then checked for the mean read depth, when mapping the 18 isolates to each reference genome, to detect unusually high read depth that would indicate the presence of repetitive sequences in the regions that nevertheless passed our filters described above. For each contig, we also checked for the average number of isolates with missing data per position to identify contigs with high levels of missing data. For Gd45, out of eight contigs that were smaller than 50 kb, two had no reads mapped, and five out of the six remaining had unusual mean read depth (lower or higher than other contigs) and/or a high average number of isolates with missing data (Supplementary Table 12). Therefore, all eight contigs below 50 kb, representing in total only 0.46% of the Gd45 genome were excluded from the analyses. For Gd293, all contigs were larger than 50 kb and a single contig (contig_44) deviated from the values observed for other contigs by having a read depth about 100× higher than other contigs and having on average 13 isolates with missing data per position (Supplementary Table 12). This contig, most likely an accessory chromosome with a high proportion of repetitive sequences, was therefore excluded from further analyses. After all filtering steps were performed (Supplementary Table 13), the average read depths were 237 and 240 when mapping on Gd293 and Gd45, respectively. The correlation between the read depths of the 18 genomes when mapped on Gd293 and Gd45 was 0.996 (Spearman’s rank correlation ρ; P < 0.001), which highlighted that the mapping worked equally well on either reference genome.

On average, each isolate had a genotype for over 95% of positions (Supplementary Table 13). In the end, we obtained data on 21,609,224 positions when using Gd293 as the reference (including 92,593 biallelic SNPs) and 22,041,192 positions when using Gd45 as the reference (including 95,638 biallelic SNPs). These data were stored in a single VCF file (with 18 samples) per reference genome. Furthermore, to evaluate the quality of our full procedure, we checked the number of differences obtained by mapping the Illumina reads of Gd293 to the de novo-assembled MinION genome (reference genome Gd293, see the section ‘Genome assembly’ for assembly details). We did a similar analysis for Gd45 mapped to the reference genome Gd45. For Gd293, the number of differences with the reference genome (itself) was 19 variants out of 23,660,591 positions in the final VCF file, which indicated a combined mapping, base calling and genotyping error rate of 8.030231 × 10^–7; that is 1 error every 1,245,294 bp, which corresponded to a quality score of Q60.95. For Gd45 (mapped on the reference genome Gd45), the error rate was also extremely low: 2.444712 × 10^–6 (54 variants out of 22,088,490 bp), or 1 error every 409,046 bp (Q56.11). These data provide evidence that our pipeline, which incorporated several stringent filters, recovered highly reliable SNPs while preserving a large amount of data (>22 Mb accounting for around 60% of the reference genomes).

Diversity and differentiation calculation

Genetic diversity π was calculated per clade (11 for Pd-1 and 7 for Pd-2) across 50 kb windows using pixy (v.1.2.7.beta1), which takes into account missing data in calculations and hence provides unbiased estimates⁸⁹. The index of genetic differentiation (F_ST; Weir and Cockerham’s method) between the two clades was calculated using a modified version of vcftools (https://github.com/jydu/vcftools) to allow the computation of statistics with haploid data.

Phylogenetic relationships

The relationships between the 18 P. destructans isolates (Supplementary Table 10) and the Pseudogymnoascus sp. outgroup (isolate 267) were reconstructed using maximum likelihood in IQ-TREE2 (v.2.0.6)⁹⁰. The relationships were reconstructed when considering two methods to partition the genome: (1) BUSCO genes alone, and (2) 10 kb windows across the whole genome. Each of these partition methods were used in combination with each of the two reference genomes (Gd293 and Gd45; see the section ‘Data checking and mapping’), which resulted in a total of four datasets. All four datasets converged towards the same topology, recovering Pd-1 and Pd-2 as two reciprocally monophyletic clades (Fig. 2b and Supplementary Fig. 5).

BUSCO phylogeny

To produce the phylogenetic tree using newly sequenced and publicly available BUSCO genes of the isolates, we used all single copy, complete orthologues as identified from the fungi odb10 database, that were common to all 18 isolates (Supplementary Table 11) and the Pseudogymnoascus sp. outgroup (isolate Gd267). This resulted in 664 and 662 BUSCO genes when mapping on Gd293 and Gd45, respectively. We extracted the genes from each assembly with BEDTools (v.2.30.0-8)⁹¹ using the command getfasta before individually aligning with MAFFT (v.7.453)⁹² (–auto, –adjustdirection). In IQ-TREE2, a concatenation-based species tree with edge-linked proportional partition model with 1,000 ultrafast bootstrap (-B 1000 -T AUTO) was produced⁹³. To produce the species concordance factor, which measures how consistent the genealogical relationships are across different loci, the orthologue and species trees were used (–scf 100–prefix concord -T 10) by IQ-TREE2. The final tree was manually rooted at the outgroup, Gd267, in FigTree (v.1.4.4; http://tree.bio.ed.ac.uk/software/figtree/). To represent conflicting phylogenetic signals between gene trees (for example, owing to recombination and/or incomplete lineage sorting), we used the function consensusNet from the phangorn R package⁹⁴ and computed the consensus network from the splits occurring in the different gene trees. Only splits occurring in at least 10% of trees were represented in the network.

Whole-genome phylogeny

The relationship between the 18 P. destructans isolates (Supplementary Table 10) and the Pseudogymnoascus sp. outgroup (isolate 267) was reconstructed using maximum likelihood in IQ-TREE2 (v.2.0.4)⁹⁰. We used vcftools with the ‘missing-site’ function to extract missingness on a per-site basis from the VCF file containing the genotypes of the 19 (18 + 1) individuals (see the section ‘Base calling, genotyping and filtering’). Then, we calculated the number of positions with no missing data over 10 kb non-overlapping windows and only kept windows with at least 5,000 positions with no missing data across the 19 isolates. For each of the 19 isolates, we then used BEDTools getfasta to extract sequences (FASTA format) from these windows. In each window, only positions without missing data (that is, minimum 5 kb) were kept. We obtained a total of 1,275 and 1,288 genomic partitions when using the reference genomes from each clade, Gd293 and Gd45, respectively. A concatenation-based species tree with edge-linked proportional partition model with 1,000 ultrafast bootstrap (-B 1000 -T AUTO) was then produced. This procedure was applied in parallel to the data mapped on reference genomes from each clade, Gd293 and Gd45 (VCF files).

Analyses of recombination

To test for the presence of recombination in each of the two clades, we used two tests, the pairwise homoplasy index (PHI or Φw) test and the four-gamete test (FGT). The Φw test calculates a pairwise similarity index between closely linked sites (situated in w bases) and compares observed values to values obtained after permutation of the sites. Under the null hypothesis of no recombination, the genealogical correlation between adjacent sites remains unchanged even if the order of the sites is shuffled. This is because all sites share the same evolutionary history when recombination is absent. However, when recombination occurs, the order of the sites becomes important as distant sites tend to have weaker genealogical correlations compared with adjacent sites⁹⁵. We used the Φw test as implemented in Splitstree CE (v.6.3.27)⁹⁶ to test the null hypothesis of clonality. The FGT test involves counting the number of allelic combinations between any pair of SNPs on the same contig. Assuming an infinite site model, the observation of four combinations is incompatible with the absence of recombination. For example, if two SNPs have the alleles C/T and A/G, then the observation of the haplotypes CA, CG, TA and TG must be the result of a recombination event. We calculated the number of haplotypes using the script FGT.pl (https://github.com/dbsloan/fgt). In each clade, we performed FGT on windows of 100 kb, with at least 20 kb of unmasked nucleotide positions and no missing data. We performed these analyses on the genomic data (Illumina data from individually tagged isolates) from the 18 isolates (11 from Pd-1 and 7 from Pd-2; described in section ‘Analyses of Illumina reads (individually tagged isolates)’) mapped on the reference genomes (Gd293 and Gd45) of Pd-1 and Pd-2 clades, respectively.

Next, we estimated the population recombination rate (r = 2N_er, where r is the recombination rate per bp per generation and N_e is the effective population size) using LDhelmet software⁹⁷ (v.1.10; https://github.com/popgenmethods/LDhelmet). We used the parameters -t 0.0005 and -t 0.001 for the population mutation rate (q = 2N_em, where m is the mutation rate per bp per generation) for Pd-1 and Pd-2, respectively, -w 200, and we estimated the mutation transition matrix using the IQTREE2 substitution model GTR⁹⁸. Default values were used for other parameters following the LDhelmet manual.

Apart from formal recombination tests carried out as explained above, we used the reconstructed phylogenetic tree obtained with microsatellite data (described in the section ‘Analyses of multilocus genotypes’) to map the two mating types. This analysis demonstrated the presence of both mating types in Pd-1 and Pd-2 clades, and the presence of both mating types throughout the phylogenetic tree depicting relationships between MLGs in each clade. These data provide further evidence for recombination in Pd-1 and Pd-2 clades (Supplementary Fig. 1).

Modelling demographic history

To model the evolutionary history of both clades, we implemented Approximate Bayesian Computation comparing two demographic models with and without contemporaneous migration. Specifically, we evaluated the isolation-migration model (IM) and the strict-isolation model (SI). These two models imply a population split at a time T_split in the past, but this event is followed by constant migration in the IM model or no migration in the SI model. The ancestral population and the derived population have an independent population size (N_a, N₁ and N₂). Population size is also free to vary within population 1 and 2 independently once at time T_dem1 and T_dem2 in the past for population 1 and 2, respectively. Coalescence simulations were generated using msnsam (October 2007 version)⁹⁹, which is a modified version of ms¹⁰⁰. The priors for population size, T_split and migration rate were generated using uniform distributions. Priors and summary statistics were computed using scripts taken from DILS¹⁰¹ (https://github.com/popgenomics/DILS_web). One million simulations were performed for both models using a mutation rate of 1.5 × 10^–9 mutation per site per generation, a recombination rate half of the mutation rate, a T_split between 10 and 10 million generations and an N between 10 and 10 million individuals per population. As performing the analyses on the full genome would be computationally too intensive (for the simulations part), we instead used a representative sample across the genome. To accomplish this, we selected 1 kb sequence windows, spaced 10 kb apart. This resulted in a dataset of 4,832 and 2,881 windows for the datasets when using Pd-1 and Pd-2 reference genomes, respectively.

Model selection was performed using the random forest method implemented in the R package abcrf (v.1.9)¹⁰² and using the postPr function of the R package abc (v.2.2.1)¹⁰³. Parameter estimations were obtained using the neural network method implemented in the abc package using 5,000 simulations. Finally, we computed a goodness-of-fit statistic for each of the summary statistics as the proportion of simulations of the posterior distributions with a statistic superior or inferior to the observed statistic with this proportion P always smaller than 0.5.

The model selection procedure led to the selection of the SI model, with posterior probabilities of 0.77 and 0.75 with the random forest method for references Gd293 and Gd45, respectively. The rejection method led to the selection of the SI model 79% and 76% of the times for reference Gd293 and Gd45, respectively. Therefore, both model-selection methods support the model without contemporaneous migration. The full set of parameters estimated from the posterior distributions of the SI model using the neural network method is presented in Supplementary Table 16. Based on the SI model, the divergence time between Pd-1 and Pd-2 clades was estimated to have occurred between 114,000 and 1.5 generations ago (Supplementary Table 16).

Analyses of Illumina reads (Pool-seq data)

Data checking and mapping

We checked and mapped the Illumina data (each of the four pools per clade) as detailed in the section ‘Data checking and mapping’ for Illumina reads above.

Data pooling

For each of Pd-1 and Pd-2, we had four pools (hereafter called ‘subpools’) of isolates (with 17, 18, 17 and 17 isolates for Pd-1 and 16, 16, 16 and 15 isolates for Pd-2). Before combining the four subpools per clade, we calculated the number of reads per subpool and subsampled them with Samtools view to have the same number of reads per isolate (1,704,260 reads) in each subpool. We then used Samtools merge to merge BAM files for the four subpools per clade, which resulted in one final pool (pool of four subpools) of 69 isolates for Pd-1 (117,589,122 reads) and one final pool (pool of four subpools) of 63 isolates for Pd-2 (107,355,848 reads).

Genotyping

Genotyping was carried out separately for each pool (one for Pd-1 and one for Pd-2), mapped on the reference genome from clade 1 and 2, which resulted in four separate analyses. Samtools mpileup was then used to generate pileup outputs from BAM files. For SNP calling, we applied a Bayesian approach specifically designed for pools (SNAPE-pooled^104,105) that calculates the posterior probability that a site is polymorphic. This approach has been validated using both simulations and empirical approaches and is among the best performing method currently available¹⁰⁴. SNAPE-pooled prior parameters were sets as follows: θ = 0.0005, D = 0.0025 or D = 0.0005 when mapping the pool on the reference from the different versus same clade respectively, prior = ‘informative’, fold = ‘folded’, and nchr = number of (haploid) isolates in the pool. We converted the SNAPE-pooled output file to a VCF and as recommended by the program developers, we only considered positions with a posterior probability ≥0.9 as being polymorphic. In all other cases, positions were marked as monomorphic. To avoid including sequencing errors as rare alleles, we adopted two complementary strategies. First, when converting the SNAPE-pooled output to a VCF, we only included alleles that were supported by at least five reads in the pool (that is, the four subpools together). Second, we performed SNAPE-pooled analyses on each subpool (with parameters as described above) to identify alleles that were not supported by at least two reads in two subpools. These alleles were then removed from the VCF file of the pool.

Data filtering

To limit false-positive SNPs, we applied further filters to the VCF file to remove the following: (1) regions where repetitive DNA elements were either confirmed or suspected; (2) regions with low or high read depth (based on the Pool-seq data); and (3) regions that were identified as problematic based on read depth of individually tagged isolates (see the section ‘Base calling, genotyping and filtering’ of Illumina reads).

Filtering repetitive DNA

The removal of regions with confirmed or suspected repetitive DNA (including paralogues) that could not be confidently mapped with short-read data was carried out using three complementary approaches.

First, we removed regions that were masked from the reference genome (see the section ‘Repeat annotation’).

Second, we removed regions suspected to contain hidden repetitive DNA. Although we removed regions that were masked from the reference genome (see above), a single or a set of genomes (18 in our case, see section ‘Repeat annotation’) is unlikely to harbour the full extent of the repertoire existing in repetitive DNA in a larger number of isolates such as in the Pool-seq dataset. Hidden repetitive DNA can generate spurious heterozygous genotypes that can confound the estimation of genetic differentiation between populations or species (for example, ref. ¹⁰⁶). Hence, when using reads from an isolate that was not used to build the repeat library or the filters (for example, Pool-seq data), the unique mapping of its reads to a reference genome does not per se confirm that these reads originated from non-duplicated regions. We used levels of genetic diversity (π) as a proxy to detect hidden repetitive DNA. Repetitive DNA elements sharing a common ancestor accumulate mutations; therefore, when such loci (two or more copies) are erroneously merged together as a single locus, one expects higher levels of genetic diversity provided that everything else remains equal. We therefore removed regions with levels of π greater than 1%. This threshold was obtained by calculating the level of genetic diversity observed across 18 genomes rather than pools of individuals (see the section ‘Analyses of Illumina reads (individually tagged isolates)’; Supplementary Table 10) for which only 0.43% and 1.43% of sites for Pd-1 and Pd-2, respectively, showed π values greater than 1%. π was calculated per site using pixy (v.1.2.7.beta1), which takes into account missing data in calculations and hence provides unbiased estimates⁸⁹. Site-based calculations from pixy were then averaged in R (function ‘runner’) over 100 bp sliding windows by summing the raw counts and recomputing the differences/comparisons ratios (https://pixy.readthedocs.io/en/latest/output.html). For the Pool-seq data, genetic diversity (named ‘Q1’; see refs. ^107,108) was calculated using the function ‘computeFST’ in the poolfstat (v.2.0.0) package in R. Sites included in sliding windows with genetic diversity greater than 1% were stored in a BED file. Bedops (v.2.4.41) was used to flatten all disjoint, overlapping and adjoining element regions into contiguous, disjoint regions. BEDTools (v.2.30.0-8) merge was used to merge features that were separated by 500 bp maximum.

Third, using the same rationale as the second filter detailed above but with a different approach, we took advantage of the nature of the data (P. destructans isolates are haploid) to identify and exclude regions of the genome where mapping would lead to the calling of a heterozygote when performing the analyses of a haploid isolate in diploid mode¹⁰⁹. Each of the 18 isolates (as listed in Supplementary Table 10 (except the outgroup)) were mapped against the reference genome (as per the section ‘Data checking and mapping’ for Illumina reads) and analysed in diploid mode (that is, considering that the isolate is diploid). For this, we used the same pipeline as for base calling isolates (with gatk; see the section ‘Analyses of Illumina reads (individually tagged isolates)’) but with ploidy of 2. Sites scored as heterozygote in at least one isolate were recorded in a BED file. A total of 183,756 heterozygote sites (spread across contigs) were identified in those 18 genomes when using Gd293 as the reference and 177,785 when using Gd45 as the reference. All those heterozygote sites were removed from all 18 genomes, irrespective the isolates they originated from. Although these data alone might indicate that P. destructans is diploid, results from the microsatellite analysis firmly refuted this hypothesis. Indeed, we genotyped 5,479 isolates originating from single spore (that is, monosporic isolation) cultures for 18 microsatellite loci (see the section ‘Cultures and genotyping’), and out of the 98,622 genotypes (5,479 × 18), we never encountered 2 alleles per locus for any single spore isolate. Two alleles for some loci and some isolates would be expected if P. destructans was diploid. Furthermore, such levels of heterozygosity have already been reported when base calling haploids in diploid mode in other fungal species (for example, ref. ¹⁰⁹).

The BED files created for the three complementary approaches detailed above were then processed in Bedops to flatten all disjoint, overlapping and adjoining element regions into contiguous, disjoint regions and BEDTools merge was used to merge features that were separated by 500 bp maximum. These regions contained in the final BED file were then excluded from the VCF file in BEDTools substract.

Filtering based on read depth

Many species of fungi have accessory chromosomes, and the results from the section ‘Base calling, genotyping and filtering’ provide strong evidence that this is also the case for both of the P. destructans clades. As a result, stringent filtering based on read depth alone was not possible. We therefore only performed light filtering by setting sites with a read depth below 20× or above 600× per pool as missing data. Based on read depth and levels of missing data from individually tagged isolates (section ‘Data checking and mapping’ for Illumina reads; Supplementary Table 12), we identified a few problematic contigs that were also filtered out from the Pool-seq data.

Filtering outcomes

The filtering steps described in the previous two sections resulted in a narrowing of the 95% highest density interval (hdi95) of the read depth for both clades when mapped to the reference genome of both clades (Supplementary Fig. 6). When using reference genome Gd45, Pd-1 hdi95 narrowed down from 5–529 to 234–560 whereas Pd-2 hdi95 narrowed down from 124–470 to 217–470. When using reference genome Gd293, Pd-1 hdi95 narrowed down from 92–512 to 236–523 whereas Pd-2 hdi95 narrowed down from 5–511 to 227–542. Positions filtered out were mostly, although not exclusively, positions with read depth less than 200 (Supplementary Fig. 6), which meant that accessory chromosomes constitute a substantial part of the data filtered out. This result was expected, as accessory chromosomes are known to be repeat-rich¹¹⁰, hence they are expected to be more heavily filtered out than core chromosomes. After filtering, the dataset consisted of 17,110,071 and 16,733,801 positions for Pd-1 and Pd-2 pools, respectively, mapped to Gd293, and 16,675,338 and 17,286,690 positions for Pd-1 and Pd-2 pools, respectively, mapped to Gd45. The median read depths were 446 and 382 for Pd-1 and Pd-2, respectively, mapped on Gd45. The median read depths were 425 and 437 for Pd-1 and Pd-2, respectively, mapped on Gd293. A similar number of SNPs (55,919 and 57,330) was identified when mapping the pools on Gd293 and Gd45, respectively.

Calculation of the index of differentiation F
_ST

The VCF file was then imported into R using poolfstat (v.2.0.0)¹⁰⁸ and the vcf2pooldata function (designating the pool size as 69 and 63 for clade Pd-1 and Pd-2, respectively, max.cov.per.pool = 600, min.cov.per.pool = 20). The multilocus F_ST was then calculated across 200 SNPs with the compute.FST function.

Variations in growth rates and growth-medium colouration

To evaluate variations in culture-related properties, a subset of isolates was photographed in a custom-built photobox using the same camera, lens and settings (Canon EOS 600D with 60 mm Canon Macro lens EF-S, shutter speed = 1/6,000, aperture = 3.2, ISO = 400, evaluative metering; see ref. ¹¹¹), biweekly for 8 weeks. For this purpose, 45 and 34 isolates of Pd-1 and Pd-2, respectively, were re-cultured on the same day on the same batch of culture medium. Six days later, for each isolate, a single germinating spore was physically moved to a new plate with growth medium (detailed in the section ‘Cultures and genotyping’) and stored upside down at 10 °C. The identity of isolates is provided in Supplementary Table 1.

Analysis of growth

The analysis of the pictures was carried out in R (v.4.1.1)⁵² using the packages EBImage (v.4.3)¹¹², ks (v.1.14.1)¹¹³, adehabitatHR (v.0.4.21)^114,115, sp (v.1.4-6)¹¹⁶ and adimpro (v.0.9.6)¹¹⁷. Images were imported into R using the readImage function that extracts the intensity of each pixel of the red, green and blue channels. Pixels with red intensity above 0.7 were characteristic of cultures, whereas the background (culture medium) was below 0.7. Coordinates (that is, their position in the image) of pixels with red intensity above 0.7 were stored and used to calculate the minimum convex polygon (MCP) using the mcp function of the adehabitatHR package. The MCP was used to outline the edge of the fungal growth and to calculate the number of pixels it covered. To overcome potential issues with dark non-fungal material present on the plate (for example, droplets of water or dirt) that would be included in 100% MCPs and hence artificially increase culture size, we calculated 11 MCPs per isolate, excluding 5−15% of outliers in steps of 1 (11 values; that is, MCP 85 until 95%). We then extrapolated the number of pixels covered by the fungus (for example, for an isolate covering 180 pixels calculated with an MCP 90%: 180/90 × 100 = 200 pixels). For each MCP, pixels were finally transformed to square centimetres with the use of cross multiplication in relation to an object of known size (5.207681 cm²). For each isolate, the average across the 11 MCPs was used, and the quality of the estimates was evaluated using the standard deviation of the estimate made for the 11 MCPs and visual inspection of the edge of the MCPs depicted on top of the original picture. Resulting culture sizes are visualized in Extended Data Fig. 4 and Supplementary Table 9.

Analysis of culture darkness

The colouration of culture medium as a result of culture growth was measured from the same pictures taken for the analysis of growth (see the section ‘Analysis of growth’; 45 isolates for Pd-1 and 34 isolates for Pd-2). Analyses were carried out in R using EBImage (v.4.3)¹¹². Colour images were transformed into black and white images, and the pixel density (a value from 0 to 1; 1 being white and 0 being black) was recorded in 3 rectangles distributed on the area showing the culture medium (that is, not touching the edge of the culture and also avoiding the centre with fungal growth). The median pixel density among the three rectangles was used as a proxy for culture darkness. For each isolate, the difference in pixel density between the picture taken after 1 week and that taken after 8 weeks was calculated (8 weeks – 1 week, resulting in a positive value if darkness increased, a negative value when darkness decreased). Results are presented in Extended Data Fig. 2 and Supplementary Table 8.

Maps and plotting

Unless otherwise stated, figures were produced in R using functions from base R⁵² and ggplot2 (v.3.5.0)¹¹⁸. Maps (Figs. 1a and 2a) were downloaded as tiles from Stadia Maps (https://stadiamaps.com/) with data by OpenStreetMap (map tiles by Stamen Design under ODbL, under CC BY 4.0) and plotted using the ggmap package (v.3.0.2)¹¹⁹. They represent maps of type ‘stamen_terrain_background’, with colours representing natural vegetation colours and elevation (through shading). Maps for Figs. 3a and 4a,b were obtained from the R packages rworldmap (v.1.3.8)¹²⁰ and rworldxtra (v.1.0.1)¹²¹. Bat species distribution were recovered from the International Union for Conservation of Nature (IUCN) website¹²² as shape files (https://www.iucnredlist.org/) and plotted in R. Inkscape¹²³ (v1.1.1; https://inkscape.org) was used to optimize visualizations.

Statistical analyses

We tested the relationship between clade identity (Pd-1 or Pd-2; binary response) and environmental factors, including bat species (nominal variable), latitude and longitude (both numerical variables were scaled). For this analysis, closely related bat species that are challenging to identify during hibernation were pooled together. Furthermore, to improve model convergence and to ensure identifiability of the model, only the most commonly infected species, that is, species from which we isolated Pd at ten or more sites were included in the model. This resulted in a dataset comprising 4,295 isolates from 231 sites in Europe, broken down as follows: M. mystacinus (28 isolates, 10 sites); M. dasycneme (107 isolates, 11 sites); M. daubentonii (111 isolates, 17 sites); Myotis species complex (322 isolates, 38 sites); and M. myotis/blythii (3,727 isolates, 187 sites). We fit a Bayesian hierarchical model using clade identity as our response variable and a logit link function. Bat species, latitude and longitude were included as population-level effects (analogous to a fixed effect in a frequentist approach), whereas samples (nested within sites) and sites were included as group effect (analogous to a random effect in a frequentist approach) with random intercept. The group-level effects were used to account for variations between sites and between samples within site. Given that both P. destructans clades were found in roughly equal numbers in M. mystacinus (13 and 15 isolates for Pd-1 and Pd-2, respectively), this bat species was used as the baseline category. To estimate the model parameters and to perform Bayesian inference, the model was fitted using 8 chains with 10,000 iterations each. To improve convergence, the ‘adapt_delta’ parameter, controlling the acceptance rate of the algorithm, was set to 0.99. The first 1,000 iterations of each chain were discarded as warm-up (burn-in) to ensure convergence. Chains were sampled using the NUTS (No-U-Turn Sampler) algorithm in Stan (https://mc-stan.org/) with the brms (v. 2.20.3)¹²⁴ package in R. Effective sample size measures (Bulk_ESS and Tail_ESS) were calculated to assess the quality of the draws, and the potential scale reduction factor (Rhat) was used to evaluate the convergence of the chains as previously proposed¹²⁵. Collinearity among the explanatory variables was assessed using the generalized variance inflation factor, computed through the ‘check_collinearity’ function available in the ‘performance’ (v.0.12.4)¹²⁶ package in R.

Analyses of barcoding genes

We investigated two universal barcoding genes that are single copy genes, the translation elongation factor 1α (TEF1) and the DNA-directed RNA polymerase II subunit B (RPB2)¹²⁷ (RPB2). We retrieved sequences of these two genes from the full genomes of 18 isolates (based on Illumina read mapped on reference genome Gd293; see the section ‘Analyses of Illumina reads (individually tagged isolates)’) and from the filtered Pool-seq datasets of each P. destructans clade, Pd-1 and Pd-2 containing 69 and 63 isolates, respectively (also mapped on Gd293; see the section ‘Analyses of Illumina reads (Pool-seq data)’). This was done to search for fixed positions that could discriminate the P. destructans clades.

Based on sequences from 150 isolates (69 and 63 from Pool-seq data; 18 from full genomes), we identified six substitutions in TEF1 (at positions 3155612, 3155783, 3156573, 3157304, 3157318 and 3157602 on contig 34 of Gd293) and three in RPB2 (at positions 523497, 525708 and 526300 on contig 34 of Gd293) that fully discriminated Pd-1 and Pd-2 clades. These nine substitutions were fixed in clades. Based on the sequences from the 18 isolates mentioned above and the outgroup Gd267, both clades formed monophyletic groups when building a phylogenetic tree for each gene separately (data not shown).

Based on these discriminating sites in TEF1 and RPB2, we searched in published nucleotide sequences in NCBI for P. destructans to classify isolates to either clade Pd-1 or Pd-2. Using this approach, a set of isolates from the Czech Republic (n = 3), Portugal (n = 13) and South Korea (n = 2) could be identified as belonging to clade Pd-2 (Supplementary Table 14), for example. These data combined with the data presented in the main text confirmed that Pd-1 and Pd-2 co-occur in Europe, but thus far, only Pd-2 has been found in East Asia (one isolate from China, one from Mongolia, two from South Korea). This result suggests that Pd-1 is rarer in East Asia or perhaps even absent.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Two distinct host-specialized fungal species cause white-nose disease in bats

Sample and field data collection

Laboratory materials and methods

Cultures and genotyping

DNA extraction for MinION and Illumina reads

Sample preparation for MinION nanopore sequencing

Sample preparation for Illumina sequencing (individual isolates)

Sample preparation for Illumina sequencing (Pool-seq)

Analyses of multilocus genotypes

Analyses of genomes de novo assembled from long-reads

Base calling

Genome assembly

Repeat annotation

Genome annotation

BUSCO genes

Sequence divergence

Synteny

Analyses of Illumina reads (individually tagged isolates)

Data checking and mapping

Base calling, genotyping and filtering

Diversity and differentiation calculation

Phylogenetic relationships

BUSCO phylogeny

Whole-genome phylogeny

Analyses of recombination

Modelling demographic history

Analyses of Illumina reads (Pool-seq data)

Data checking and mapping

Data pooling

Genotyping

Data filtering

Filtering repetitive DNA

Filtering based on read depth

Filtering outcomes

Calculation of the index of differentiation F ST

Variations in growth rates and growth-medium colouration

Analysis of growth

Analysis of culture darkness

Maps and plotting

Statistical analyses

Analyses of barcoding genes

Reporting summary

Most Popular

Recent Comments

ABOUT US

POPULAR POSTS

POPULAR CATEGORY

Calculation of the index of differentiation F
_ST