GWAS summary statistics
GWAS summary statistics for 305 continuous traits were downloaded from the Neale Lab (http://www.nealelab.is/uk-biobank/; v3). These regressions were run on inverse rank normal-transformed phenotypes in a subset of the UK Biobank consisting of approximately 360,000 individuals and included age, age2, inferred sex, age × inferred sex, age2 × inferred sex and principal components 1–20 as covariates. We used 5 × 10−8 as the threshold for genome-wide significance unless otherwise stated.
LoF burden test summary statistics
Summary statistics for 292 LoF burden tests were downloaded from Backman et al.4. Two-hundred and nine traits overlapped with traits for which we had GWAS summary data (Supplementary Table 1). Burden genotypes were calculated by calling individuals homozygous for the non-LoF variant at all sites as being homozygous non-LoF, calling individuals homozygous for the LoF allele at any site as being homozygous LoF, and calling all other individuals heterozygotes. Burden tests were run using REGENIE59 on inverse rank normal-transformed phenotypes. For our primary analyses, we used the result of the burden test with mask M1, which only includes variants that are predicted as being LoFs using the most stringent filtering criteria and an allele frequency upper bound of 1%. For analyses including missense variants, we used mask M3, which also includes ‘likely damaging’ missense variants, again upper bounding the frequency of included variants at 1% (see ref. 4 for more details). We used a per-trait genome-wide significance threshold of 2.7 × 10−6, derived by applying a Bonferroni correction to a significance threshold of 0.05 for testing approximately 18,000 genes per trait.
A subset of genetically uncorrelated traits
The set of 209 quantitative traits included some that were highly correlated, such as sitting height and standing height. For certain analyses, we selected a subset of 27 traits that were not highly correlated by intersecting the 209 traits with those analysed by Mostafavi et al.45 (Supplementary Table 1). In brief, the trait list was pruned to ensure that all pairwise genetic correlations, as reported by the Neale laboratory, were below 0.5, prioritizing traits with higher heritability. Biomarkers were excluded from this subset because their genetic correlations with other traits were not provided by the Neale laboratory. Genetic and phenotypic correlations between these 27 traits as reported by the Neale laboratory are listed in Supplementary Table 2. Genetic correlations ranged between −0.3096 and 0.2742. Phenotypic correlations for eight trait pairs were missing from the Neale laboratory (all including the trait ‘heel quantitative ultrasound index, direct entry’). The remaining phenotypic correlations ranged between −0.2117 and 0.1972.
We used this subset of traits to ensure that our results (Figs. 3b–d and 4b,c and Extended Data Figs. 1a–c and 3a–c) were not driven by many correlated phenotypes all sharing the same underlying biology. As such, slight correlations between these phenotypes should not substantively affect our results or interpretations.
Defining GWAS loci
For a systematic comparison of discoveries between GWAS and burden tests (shown in Fig. 1c,d), we grouped GWAS variants into large, non-overlapping genomic loci. This approach avoids multiple counting of the same GWAS genes, as nearby hits within a locus may map to the same gene, and it provides a conservative estimate of the overlap between GWAS and burden test results as described below.
We focused on 151 quantitative traits with at least one burden test hit and one GWAS hit. For each trait, we analysed the set of LD-clumped hits (\(P < 5\times 10^-8\), clumping \(r^2 < 0.1\)) from 8,136,100 filtered SNPs provided by Mostafavi et al.45. A secondary analysis (Supplementary Figs. 11–13) used the same LD-clumping pipeline but with a stricter threshold of \(r^2 < 0.01\).
For each trait, we began the grouping procedure with the most significant hit and iteratively processed all hits until they were assigned to a locus. For each hit, we included all independent hits with larger P values (lower significance) within 1 Mb to form a locus. The locus size was then expanded to ensure that no other hit was within 1 Mb of any variant already included in the locus. After completing one locus, we moved on to the next most significant hit that had not yet been assigned to any locus. Finally, we assigned overlapping genes to each locus, focusing on the 18,524 protein-coding genes analysed in the LoF burden test.
In Fig. 1c, we show the ranking of genome-wide significant genes from the burden test and the ranking of their corresponding GWAS loci, based on the P value of the most significant GWAS variant within each locus. In Fig. 1d, we plot the P value of the most significant GWAS variant within each locus on the x axis and the P value of the most significant gene from the burden test within the same locus on the y axis.
In a subset of analyses, we included only the top GWAS loci to match the statistical power of the burden test for gene discovery. We illustrate our procedure with the example of standing height. The LoF burden test for standing height identified 82 significant genes (\(P < 2.7\times 10^-6\), to account for the 18,524 genes tested). The GWAS analysis identified 3,374 nearly independent hits. Following the grouping procedure outlined above, these hits were consolidated into 382 loci (median size of 3.2 Mb). We ranked these loci by the minimum P value within each locus. Starting with the top-ranked locus, we iteratively added GWAS loci until we selected 82 genes. From each locus, we selected all genes that were significant in the LoF burden test. If no such genes existed, we selected the gene with the smallest burden test P value.
This procedure ensures that our analysis of the overlap between burden test and GWAS discoveries is conservative. The overestimation arises first from prioritizing genes based on burden test P values and second from using large GWAS loci, which may contain more than one causal gene, thereby increasing the likelihood of overlap with burden test results.
Comparing GWAS and LoF burden tests in LD blocks
To avoid exacerbating dissimilarities between LoF burden tests and GWAS caused by mislocalization of GWAS signals, we also performed analyses at the LD block level. We downloaded bed files containing the coordinates of approximately independent LD blocks from ref. 60. For each trait, we computed the minimum GWAS P value of variants within each block and compared that with the minimum LoF burden test P value for all genes that overlapped any part of that block. In a small number of cases, the smallest LoF burden test P value in two adjacent blocks would be the same because a single highly significant gene overlapped both blocks. This generally reduced the correlation between the minimum P values of GWAS and LoF burden tests, and so we dropped all such blocks to be conservative.
Ranking loci with conditionally independent hits
To evaluate the robustness of GWAS locus ranking, we also performed SNP selection using COJO61 instead of LD clumping. Starting from the set of 8,136,100 filtered SNPs (described above), we ran COJO for each trait using the –cojo-p 5e-8 and –cojo-slct options. The output is a set of conditionally independent SNPs along with their co-estimated effect sizes, which we used to define GWAS loci as described above for LD-clumped SNPs. Results based on this alternative approach are presented in Supplementary Figs. 8–10.
Ranking genes in GWAS by MAGMA P value
We used MAGMA57 to obtain gene-level P values from GWAS data. We generally followed the suggestions from ref. 62. In brief, as recommended by PoPS62, we assigned SNPs to genes using magma –annotate with the default settings, which only uses SNPs inside of gene bodies. These were then combined using the 1000 Genomes EUR63 LD panel distributed with MAGMA to obtain gene-level P values using the –gene-model snp-wise=mean option.
Ranking genes in GWAS by PoPS score
We also obtained gene-level scores using PoPS62. PoPS uses gene features to predict the gene-level scores produced by MAGMA. We used the MAGMA results as described above, and then downloaded approximately 40,000 features derived from gene expression datasets from GitHub (https://github.com/FinucaneLab/gene_features) as listed as the source of data in ref. 62. Additional features from protein–protein interaction networks and pathway membership are described in ref. 62, but only the expression features were available in the GitHub repository. Furthermore, there was a bug where features from different datasets had the same name causing PoPS to crash. We used a custom script to provide a unique name to each feature provided with PoPS. We then used PoPS with the default setting and used the output PoPS_Score from the resulting *.pred files to rank genes.
Simulating GWAS in smaller samples
To simulate the results of performing smaller GWAS, we sampled summary statistics in a subsample of size \(n\), \(\hat\alpha ^\mathrmsub,\) conditioned on summary statistics in the full sample of size \(N\), \(\hat\alpha ^\rmfull\), as
$$\hat\alpha ^\mathrmsub\rm\hat\alpha ^\mathrmfull\sim \mathrmNormal\,\left(\hat\alpha ^\mathrmfull,\fracN-nnSE(\hat\alpha ^\mathrmfull)^2\right)$$
(1)
and inflated the standard errors by a factor of \(\sqrtN/n\). For a derivation see ref. 64, supplementary material section 3.2. We performed the sampling independently across 8,136,100 filtered SNPs (described above) that passed our filtering. This independent sampling alters the LD structure between linked SNPs, but as we use these statistics in analyses that depend only on the most significant SNP within a locus, this effect on LD should be inconsequential for our qualitative conclusions.
Ranking GWAS loci using MAF thresholds
For Supplementary Figs. 26–28, we considered all 8,136,100 filtered SNPs (described above). We then filtered to only those SNPs equal to or below a given MAF threshold (0.01, 0.1 or 0.5, which includes all SNPs). We then constructed GWAS loci as described above and ranked loci by the minimum P value in that locus.
Ranking loci by largest significant effect size
For Supplementary Figs. 29–31, we again considered all 8,136,100 filtered SNPs (described above). We then constructed GWAS loci as described above but ranked loci by taking the largest absolute effect size among the genome-wide significant SNPs in each locus. We ranked loci for burden tests by taking the largest absolute effect size among all genome-wide significant genes that overlapped the locus.
Association study model
We combined population genetics and statistical genetics models to understand how natural selection affects variants based on their trait specificity and trait importance. Our model assumes that traits are under stabilizing selection based on prevailing hypotheses36,38,39 and uses standard population genetics theory33,65,66,67,68. The details of our model are outlined in the Supplementary Appendices.
Unbiased estimates of trait importance
In several analyses we require estimates of trait importance, either \(\alpha ^2\) from GWAS or \(\gamma ^2\) from LoF burden tests. The details in both cases are identical, so here we describe \(\gamma ^2\). The naive estimator of squaring the LoF burden test estimated effect size, \((\hat\gamma )^2\), is biased. Worse, this bias is anticorrelated with the frequency of the variant, which results in spurious correlations between the biased estimates and various gene properties such as \(s_\mathrmhet\).
To derive an unbiased estimator, we appealed to standard statistical genetics theory69 to assume that LoF burden estimates are approximately normally distributed about their true values with noise dependent on their standard errors. In particular, for a gene with standard error \(s\) and effect size estimate \(\hat\gamma \), we have that \(\hat\gamma \sim \mathrmNormal(\gamma ,s^2)\) approximately. This approximation is widely used for GWAS and was recently confirmed to be accurate for LoF burden tests46. It is then a routine calculation to check that \((\hat\gamma )^2-s^2\) is an unbiased estimator of \(\gamma ^2\).
Although we focus exclusively on quantitative phenotypes in this study, we note that this approximation may not be valid for effect size estimates from logistic regression applied to case–control data.
LoF burden summary statistics binned by \(\boldsymbols_\bfhet\)
When comparing LoF burden summary statistics (standard errors, \(z^2\), and unbiased estimates of \(\gamma ^2\)) to \(s_\mathrmhet\), we used \(s_\mathrmhet\) values inferred in ref. 35 and downloaded from ref. 70. We binned genes by \(s_\mathrmhet\) into 100 bins, each with approximately 184 genes. Within each bin, we averaged the respective summary statistics (for example, unbiased estimate of \(\gamma ^2\)) across traits and genes. To make sure that our results were not driven by redundant traits, we used our 27 genetically uncorrelated traits for these analyses. For heritability enrichment (Fig. 5b), we used the fact that heritability should be proportional to \(z^2-1\) (Supplementary Appendix B). Within each bin of genes, we then computed the average \(z^2-1\) in that bin relative to the average of \(z^2-1\) across all genes for each trait. This produced a trait-level enrichment for each bin, and by using the empirical standard deviation of the relative \(z^2-1\) within the bin, we could also obtain an empirical standard error for the enrichment. We then obtained an overall enrichment for each bin, by taking an inverse-variance weighted average across traits. After this averaging, the mean heritability enrichment across genes need not be one. As such, we renormalized the estimates to average to one.
ATAC peak specificity
We downloaded all ATAC-seq files from ChIP-Atlas71 that contained more than 5,000,000 mapped reads and identified at least 5,000 peaks. Across all files, overlapping peaks were combined using bedtools merge72. This yielded a total of 2,131,526 peaks. Samples other than blood samples were grouped into 17 tissues based on their annotations in ChIP-Atlas: adipocyte (146 samples), bone (190 samples), breast (815 samples), cardiovascular (559 samples), digestive (417 samples), epidermis (661 samples), gonad (138 samples), kidney (375 samples), liver (191 samples), lung (1,679 samples), muscle (118 samples), neural (1,349 samples), pancreas (322 samples), placenta (48 samples), pluripotent (1,895 samples), prostate (312 samples) and uterus (255 samples). In addition, samples with any of the following annotations were categorized as T cell (1,356 samples): CD4+ T lymphocytes, CD4+ T cells, CD8+ T lymphocytes, CD8+ T cells, fetal naive T cells, γδ T cells, naive T cells, T cells, chimeric antigen receptor T cells, follicular helper T cells, helper 0 T (TH0) cells, TH17 cells, TH1 cells, TH2 cells, TH9 cells or T lymphocytes. Samples with any of the following annotations were categorized as erythroid (102 samples): erythroid progenitors, erythroid cells or erythroblasts. Ultimately, this resulted in 19 tissues or cell-type categories.
A peak was considered to be present in a tissue if more than 5% of samples contained the peak. In downstream analyses, we used both the ‘number of shared tissues’ and ‘peak intensity’. We calculated the number of shared tissues by considering all peaks in the relevant tissue for a given trait (for example, bone for height) and then counting the number of tissues in which that peak was present. In particular, we only considered peaks that are present in the relevant tissue. We calculated peak intensity as the fraction of samples within the focal tissue that contain the peak.
Gene expression specificity
We compiled estimates of gene expression in 17 tissue or cell types, which were intended to overlap with the categorization of ATAC-seq peaks when possible. All tissues that were ultimately matched to traits (see below) were included in both our ATAC-seq tissues and our expression tissues, but there are some differences between the remaining tissues. Average gene expression transcripts per million (TPM) of the following tissues were downloaded and extracted from the Human Protein Atlas73 tissue gene data (rna_tissue_hpa.tsv.zip): adipose tissue, breast, heart muscle, colon, skin, ovary, kidney, liver, lung, skeletal muscle, amygdala, pancreas, placenta and prostate. Average gene expression TPM of the following cell types were downloaded and extracted from the Human Protein Atlas single-cell type data (rna_single_cell_type.tsv.zip): erythroid cells and T cells. Average gene expression TPM of human bone samples was downloaded from the Gene Expression Omnibus74 accession GSE106292 (refs. 75,76).
In each tissue, genes with more than 10 TPM were considered to be ‘expressed’. We then restricted our analyses to genes expressed in the trait-relevant tissue. We computed an expression specificity score by taking the expression level in TPM in the trait-relevant tissue divided by the sum of expression levels across all 17 tissues. This provided an expression specificity score for every gene expressed in the trait-relevant cell type. For analyses involving expression specificity bins, we took all of these expression specificity scores across all nine trait–tissue pairs, computed quintiles and then assigned each gene for a given trait–tissue pair to its quintile.
Linking traits to tissues
To identify which tissue (or cell type) is predominantly associated with a given trait, we ran S-LDSC9,41 to partition the heritability of all of our traits that had an estimated heritability of more than 0.04. We used annotations for 19 tissues and cell types constructed from our ATAC-seq analysis described above, along with the LDSC baseline v1.1 covariates. Our aim was to identify trait–tissue pairs in which heritability could clearly be explained by one tissue as opposed to multiple tissues. As such, we only retained traits that had a tissue with an LDSC τ with a z-score of more than 4.5 and had more than 40% of their heritability explained by variants in ATAC-seq peaks of the corresponding tissue. If more than one trait was assigned to the same tissue, we only kept genetically uncorrelated traits (\(r^2 < 0.04\)). This resulted in nine trait–tissue pairs (Supplementary Table 1): mean corpuscular volume (30040_irnt) → erythroid, reticulocyte percentage (30240_irnt) → erythroid, eosinophil percentage (30210_irnt) → T cell, lymphocyte count (30120_irnt) → T cell, standing height (50_irnt) → bone, heel bone mineral density (3148_irnt) → bone, glucose (30740_irnt) → pancreas, creatinine (30700_irnt) → liver, and alanine aminotransferase (30620_irnt) → liver.
Regression of burden z
2 on expression specificity
For each of the nine trait–tissue pairs described above, we performed a linear regression of the burden \(z^2\) for all genes expressed in the top tissue on the expression specificity of genes, binned into quintiles as described earlier. We included the unbiased estimates of the trait importance of genes (defined above) as a covariate. For each specificity bin, we calculated an inverse-variance weighted average of the regression coefficients across all nine traits, with standard errors computed as the square root of the reciprocal of the total weight. The results, shown in Supplementary Fig. 36, demonstrate that the burden test prioritization of specifically expressed genes in Fig. 3e is not driven by differences in the importance of genes across specificity bins.
S-LDSC analysis using ATAC-seq peaks
For each trait–tissue pair, we ran S-LDSC9,41 to estimate the heritability enrichment of tissue-specific ATAC-seq peaks. To this end, we categorized ATAC-seq peaks present in each tissue into five bins based on their presence in other tissues: present in 1–2 tissues, present in 3–8 tissues, present in 9–15 tissues, present in 16–18 tissues and present in all 19 tissues. Also, we categorized ATAC-seq peaks present in each tissue into five bins based on their intensity. The size of these bins were set to match the sizes of the tissue-specificity-based bins. We included the annotations based on ATAC peak tissue specificity and peak intensity bins with the LDSC baseline v1.1 model and used S-LDSC v.1.0.1 on HapMap3 SNPs77. In all analyses, we report \(\tau /h^2\), which represents the change in the proportion of heritability explained by a single variant caused by toggling the annotation of that variant from 0 to 1 while keeping all other covariates included in the regression fixed.
S-LDSC analysis using coding variants
We downloaded the variant annotation file (variants.tsv.bgz) from the Neale laboratory website (http://www.nealelab.is/uk-biobank/). We used the consequence information in the file, which corresponds to Ensembl Variant Effect Predictor (v85)78, for annotating variants. Specifically, we classified variants as being coding if their most severe consequence was any of:
For each trait–tissue pair, we ran S-LDSC9,41 to estimate the heritability enrichment of coding variants as a function of expression specificity. We included the expression specificity bin (as defined above) as an annotation in the S-LDSC model. We also categorized genes into five equally sized bins based on their expression level in the tissue of interest, and all the coding variants were categorized into one of these five bins based on the expression level of the corresponding genes. These annotations were also included in the S-LDSC model. In addition, we used the covariates in the baseline v1.1 model and restricted our analysis to HapMap3 SNPs77. All analyses were run with S-LDSC v.1.0.1. As described above, we always report \(\tau /h^2\).
LoF burden summary statistics binned by \(\boldsymbol\mu \boldsymbolL\)
Analyses comparing LoF burden summary statistics to \(\mu L\) were performed analogously to the analyses comparing the summary statistics to \(s_\mathrmhet\). As a proxy for μL, we downloaded the expected number of segregating LoFs for each gene as calculated in gnomAD (v2)79 from ref. 70. To show that \(\mu L\) is essentially driven by coding DNA sequence (CDS) length, we downloaded CDS lengths for MANE select canonical transcripts (genome build GRCh38) from Ensmbl80 and correlated them with the expected number of segregating LoFs from gnomAD79 (Supplementary Fig. 44).
Computing frequency spectra given \(\boldsymbols_\bfhet\)
To simulate under our model, we required the distribution of allele frequencies for a given selection coefficient. We assumed a stabilizing selection model, which is approximately equivalent to homozygotes having a relative fitness of 1 and heterozygotes having a fitness of 1-shet33,65,66,67. We used fastDTWF81 to compute likelihoods under this model. We assumed an equilibrium population of 20,000 diploids, and computed allele frequency distributions along a grid of 50 shet values from 10−7 to 0.05 evenly spaced on the log scale. We used 1.25 × 10−8 as the per-generation mutation rate. We considered a model where the ancestral allele is known by using the no_fix=True option in fastDTWF. In addition, fastDTWF has two parameters that control the accuracy of its approximation. On the basis of the recommendations of ref. 81, we set dtwf_tv_sd to 0.1 and dtwf_row_eps to 10−8.
Simulating realized heritability
To generate Extended Data Fig. 2b, we simulated 50,000 unlinked variants from our stabilizing selection model. We considered 1,000 values of \(s_\mathrmhet\) log-uniformly spaced between 10−7 and 2.3 × 10−4. For each value of \(s_\mathrmhet\), we then simulated 50 variants by drawing 50 allele frequencies from the allele frequency distributions that we computed as described above. To model the slight differences between population and GWAS sample allele frequencies, we then drew a GWAS sample allele count for each variant as a \(\mathrmBinomial(\mathrm600,000,f)\) random variable, where \(f\) was the population frequency, and 600,000 was chosen to match the roughly 300,000 diploids in the UK Biobank. These allele counts were then normalized to obtain GWAS sample allele frequencies, \(\widetildef\). For this simulation, we assumed that all variants have the same trait specificity. This makes \(\alpha ^2\) on the focal trait proportional to \(s_\mathrmhet\), so we set the realized heritability to \(2s_\mathrmhet\,\widetildef\,(1-\widetildef)\), and normalized all results relative to the maximum simulated realized heritability. Likewise, effect sizes were reported relative to the maximum simulated effect size.
Computing pleiotropy of GWAS hits
To investigate the pleiotropy of top versus weak GWAS hits, we considered all of the 27 uncorrelated traits that had at least 100 GWAS hits, leaving 18 traits. For each trait, we grouped the hits into four quartiles based on variant P values, with quartile 1 containing the most statistically significant hits and quartile 4 containing the least. For each hit, we calculated the number of traits (out of 18) in which the variant was a hit and computed the mean values within each quartile.
Simulating pleiotropy of GWAS hits
To simulate the effects of genetic drift on the apparent pleiotropy of GWAS hits, we simulated GWAS summary statistics. To match the real data described above, we considered 18 traits and simulated effect sizes for 10 million not necessarily segregating positions. We simulated the effect sizes independently for each position, and drew the vector of squared effect sizes for variant j,\(\vec\alpha ^2_j\in \mathbbR^18\) as
$$\vec\alpha ^2_j\sim \frac10^-7f\times \exp \3f\times \mathrmNormal(\bf,p\bfI+(1-p)\bf11^T)\$$
where the exponentiation is performed element-wise, and \(f\) and \(p\) are parameters that affect the range of different total effect sizes, \(\vec\alpha ^2_j_1\), and distribution of trait specificities.
We then assumed that the strength of selection against the variant was \(\vec\alpha ^2_j_1\). We obtained the MAF for each variant by drawing from the variant frequency distribution with the closest \(s_\mathrmhet\), computed as described above.
Finally, we simulated a GWAS by assuming that the observed association statistic for each trait was independently normally distributed about its true value. For example, for trait \(k\) and the variant at position \(j\) we have:
$$\hat\alpha _jk\sim \mathrmNormal\,\left(\sqrt\vec\alpha ^2_jk,\frac1\sqrt2N_\mathrmeff\times \mathrmMA\rmF_j(1-\mathrmMA\rmF_j))\right)$$
where \(N_\rmeff\) is a scaling factor that captures both the amount of environmental noise contributing to the trait as well as the sample size. We converted these to P values by taking \(2N_\rmeff\rmMAF_j(1-\rmMAF_j)\hat\alpha _jk^2\) as a squared z-score, which is chi-squared distributed with 1 degree of freedom under the null. We considered a variant to be a genome-wide significant hit if its P value was smaller than a parameter, \(t\).
This simulation approach has four free parameters. In the main text, we used \(f=0.33\), \(p=0.5\), \(N_\mathrmeff=10000000\) and \(t=10^-5\). Although these parameters are related to standard GWAS parameters (for example, the GWAS sample size or genome-wide significance threshold), the exact quantitative relationship should not be overanalysed. For example, we assumed that the strength of selection is exactly \(\vec\alpha ^2_j_1\). If instead there was some scaling factor, that could be absorbed into \(N_\rmeff\). Similarly, there is a qualitative inverse relationship between the effects of \(t\) and \(N_\rmeff\) (for example, lower \(t\) has a similar effect to increasing \(N_\rmeff\)), making the exact setting of either parameter somewhat arbitrary. We chose the values that we used here to roughly match the distribution of selection coefficients inferred from real GWAS data82, as well as the observed patterns of MAF and pleiotropy in the UK Biobank GWAS results. In Supplementary Figs. 45–48, we vary each of \(N_\rmeff\), \(t\), \(p\) and \(f\), respectively, while holding the others fixed to show that our qualitative results are not sensitive to the particular simulation parameters that we chose.
AMM analysis
We ran AMM47 to estimate heritability enrichments for gene sets, following the workflow previously described (https://github.com/danjweiner/AMM21; commit 524c620). We binned genes into 100 approximately equally sized bins based on \(s_\mathrmhet\) as described above and used these bins as our gene sets. AMM requires an estimate of the probability that a SNP is acting via the closest gene, second closest gene, and so on. For more distant genes, there is insufficient power to estimate these probabilities so AMM recommends combining these into bins. We followed the recommended binning, and then used the probabilities estimated in the original AMM paper47 (supplementary table 5 of ref. 47). AMM recommends using LDSC baseline covariates in all models, for which we used v2.3. We restricted our analysis to HapMap3 variants. The results in Fig. 5d are the inverse-variance weighted average of the heritability enrichment estimates across our 27 genetically uncorrelated traits. These inverse-variance weighted estimates of the average enrichments do not necessarily need to average to one in contrast to the true enrichments. As such, we renormalized the estimated enrichments so that they sum to one.
Correlation of GWAS hit probability and \(\boldsymbols_\bfhet\)
We analysed the GWAS hits curated in our previous study45, filtered to a set of 6,971,256 SNPs that passed quality control procedures. This set excluded lead GWAS SNPs in LD (\(r^2 > 0.8\)) with variants predicted to have protein-altering consequences, to condition on putatively non-coding trait associations. We focused on 15,591 approximately independent GWAS hits associated with our 27 uncorrelated traits, for which estimates of shet for the nearest gene were available. We performed logistic regression to differentiate GWAS hits from 100,000 SNPs randomly sampled from the same 6,971,256 SNP set. The shet values of the nearest genes were used as the predictor, categorized into 100 percentile bins. As in our previous work, the regression model included additional covariates: MAF, LD score, gene density and the absolute distance to the nearest transcription start site. We also incorporated dummy variables representing 20 quantiles of each of these covariates (MAF, LD score, gene density and distance to the transcription start site). Results of this analysis are presented in Supplementary Fig. 49. The covariate data were obtained from Mostafavi et al.45.
Correlation of \(\hat\boldsymbol\gamma ^\bf2\) and number of GWAS hits
To avoid double counting GWAS hits due to LD, we restricted our analysis to approximately independent hits. For each trait, we analysed the set of LD-clumped hits (\(P < 5\times 10^-8\), clumping \(r^2 < 0.1\)) from 8,136,100 filtered SNPs provided in ref. 45. We then assigned each GWAS hit to the closest gene (using the midpoint of genes as released with AMM47). For each trait, we then correlated the number of GWAS hits assigned to each gene with our unbiased estimate of the trait importance of that gene, \(\hat\gamma ^2\), based on the LoF burden test results. To make certain that our results were not driven by differences between genes with no GWAS signal versus genes with any GWAS signal, we also computed correlations between the number of GWAS hits assigned to each gene and \(\hat\gamma ^2\) restricting to genes with at least one GWAS hit.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

