Computational identification of regulatory mechanisms affected by noncoding variants associated with late-onset Alzheimer’s disease.

Date:

Poster PDF

Alexandre Amlie Wolf, Mitchell Tang, Jessica King, Beth A. Dombroski, Li-San Wang, and Gerard D Schellenberg

Background: More than 20 single nucleotide polymorphisms (SNPs) associated with late onset Alzheimer’s disease (LOAD) have been identified by genome wide association studies (GWAS). However, these SNPs are only tag markers for nearby genetic variants in linkage disequilibrium (LD) and may not be actually functional. Moreover, all 21 of the significant SNPs identified in phase 1 of the International Genomics of Alzheimer’s Project (IGAP) meta analysis (Lambert et al. 2013) are in non protein coding regions, implicating gene regulatory mechanisms as underlying the association signal. This suggests a need for functional annotation of expanded sets of SNPs spanning the LD regions tagged by the IGAP SNPs in order to identify the truly causal variants, their regulatory mechanisms, and their target genes. Methods: We developed a reproducible bioinformatics pipeline for the proposed analysis. First, we built an expanded set of variants by finding SNPs near the tagging SNP that meet a ‘locus wide’ significance threshold and identifying all SNPs in LD with any locus wide significant SNP or the tag SNP. Next, we overlapped the expanded SNP set with enhancers from FANTOM5 and eQTLs from GTEx and identified their respective target genes, characterized the distribution of SNPs across genomic elements like exons and introns, and incorporated other functional information including epigenomics data from NIH Roadmap Epigenomics and transcription factor binding sites. Results: The expanded SNP set from the 21 top IGAP phase 1 hits included 2,126 unique SNPs. We prioritized SNPs that both overlapped enhancers and were eQTLs and identified several putatively functional SNPs: 4 near CASS4, 28 near CELF1, 2 near EPHA1, 4 near SLC24A4, 128 near HLA DRB5/HLA DRB1, and 4 near MS4A6A. Each region except SLC24A4, which harbored fibroblast eQTLs and monocyte enhancers, contained at least one SNP overlapping both monocyte enhancers and whole blood eQTLs, supporting the hypothesis of innate immune response involvement in LOAD etiology. Conclusions: Computational analysis of functional genomic data across hundreds of tissues and cell types identified a small number of putatively causal SNPs for LOAD with strong functional evidence. This set of variants is being further refined and characterized, and we will report more functional results during the AAIC 2016 meeting.