Inferring enhancer and noncoding RNA dysregulation underlying 2,419 UK Biobank Phenotypes
Date:
Alexandre Amlie-Wolf, Liming Qu, Elisabeth E. Mlynarski, Pavel P. Kuksa, Yuk Yee Leung, Christopher D. Brown, Gerard D. Schellenberg, Li-San Wang
The majority of variants identified by genomewide association studies (GWAS) affect regulatory elements outside coding genes such as transcriptional enhancers and noncoding RNAs. We developed INFERNO (http://inferno.lisanwanglab.org), which integrates GWAS data with hundreds of functional genomics datasets to identify causal noncoding variants underlying association signals and their affected regulatory elements, tissue contexts, and target genes. INFERNO uses the COLOC method to identify colocalized GWAS and target gene eQTL signals overlapping enhancers in matching tissue classes and characterizes coregulatory networks of targeted long noncoding RNAs (lncRNAs). Empirical enrichments of tissuespecific enhancer overlaps are quantified.
We applied INFERNO to summary statistics for 191 case/control and 2,228 quantitative traits across anthropological and healthrelated phenotypes from the UK Biobank. We identified 1,389,198 significant variants (p ≤ 5 x 108) in 2,298 phenotypes, 42.34% of which were found in multiple phenotypes. We pruned significant variants into a median of 93 independent signals by European linkage disequilibrium (LD) and expanded these into LD blocks with a median of 509 candidate causal variants. While only 1.04% of all candidate variants were in coding exons, 2.77% overlapped FANTOM5 enhancers, 50.60% overlapped Roadmap enhancers, and 46.61% overlapped transcription factor binding sites across phenotypes. On average, variants overlapping FANTOM5 or Roadmap enhancers were associated with 5.6 and 1.9 phenotypes, respectively. A subset of variants affected non-coding RNAs: 0.07% overlapped 12 classes of small RNA and 1.6% overlapped microRNA binding sites for 2,059 miRNAs, including 2,056 affected in multiple phenotypes.
INFERNO identified strongly colocalized signals for 3,400 genes spanning all 44 GTEx tissues, 55% of which were supported by enhancer overlaps in the matching tissue categories. These included 522 lncRNAs coregulating an average of 247 genes. We also identified 2,078 significant enhancer overlap enrichments in 31 tissue classes including lung for asthma, eye for diabetes related eye disease, heart for atrial fibrillation and hypertension, adipose for waist circumference, and brain for age at completion of education and anxiety. These analyses support the utility of INFERNO for inferring the molecular mechanisms underlying noncoding GWAS signals in a huge range of both case/control and quantitative phenotypes.