Efficient Prioritization of Multiple Causal eQTL Variants via Sparse Polygenic Modeling

Naoki Nariai; William W Greenwald; Christopher DeBoever; He Li; Kelly A Frazer

doi:10.1534/genetics.117.300435

Efficient Prioritization of Multiple Causal eQTL Variants via Sparse Polygenic Modeling

Genetics. 2017 Dec;207(4):1301-1312. doi: 10.1534/genetics.117.300435. Epub 2017 Oct 26.

Authors

Naoki Nariai¹, William W Greenwald², Christopher DeBoever², He Li³, Kelly A Frazer^{4

3}

Affiliations

¹ Department of Pediatrics and Rady Children's Hospital, University of California, San Diego, La Jolla, California 92093-0761.
² Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, California 92093-0761.
³ Institute for Genomic Medicine, University of California, San Diego, La Jolla, California 92093-0761.
⁴ Department of Pediatrics and Rady Children's Hospital, University of California, San Diego, La Jolla, California 92093-0761 [email protected].

Abstract

Expression quantitative trait loci (eQTL) studies have typically used single-variant association analysis to identify genetic variants correlated with gene expression. However, this approach has several drawbacks: causal variants cannot be distinguished from nonfunctional variants in strong linkage disequilibrium, combined effects from multiple causal variants cannot be captured, and low-frequency (<5% MAF) eQTL variants are difficult to identify. While these issues possibly could be overcome by using sparse polygenic models, which associate multiple genetic variants with gene expression simultaneously, the predictive performance of these models for eQTL studies has not been evaluated. Here, we assessed the ability of three sparse polygenic models (Lasso, Elastic Net, and BSLMM) to identify causal variants, and compared their efficacy to single-variant association analysis and a fine-mapping model. Using simulated data, we determined that, while these methods performed similarly when there was one causal SNP present at a gene, BSLMM substantially outperformed single-variant association analysis for prioritizing causal eQTL variants when multiple causal eQTL variants were present (1.6- to 5.2-fold higher recall at 20% precision), and identified up to 2.3-fold more low frequency variants as the top eQTL SNP. Analysis of real RNA-seq and whole-genome sequencing data of 131 iPSC samples showed that the eQTL SNPs identified by BSLMM had a higher functional enrichment in DHS sites and were more often low-frequency than those identified with single-variant association analysis. Our study showed that BSLMM is a more effective approach than single-variant association analysis for prioritizing multiple causal eQTL variants at a single gene.

Keywords: causal variants; eQTLs; sparse polygenic models.

Publication types

Research Support, Non-U.S. Gov't
Research Support, N.I.H., Extramural

MeSH terms

Gene Expression / genetics
Genetic Predisposition to Disease*
Genetic Variation
Genome-Wide Association Study / statistics & numerical data*
Humans
Linkage Disequilibrium
Multifactorial Inheritance / genetics*
Polymorphism, Single Nucleotide / genetics
Quantitative Trait Loci / genetics*

Abstract

Publication types

MeSH terms

Grants and funding