Accurate eQTL prioritization with an ensemble-based framework

Hum Mutat. 2017 Sep;38(9):1259-1265. doi: 10.1002/humu.23198. Epub 2017 Apr 19.

Abstract

We present a novel ensemble-based computational framework, EnsembleExpr, that achieved the best performance in the Fourth Critical Assessment of Genome Interpretation expression quantitative trait locus "(eQTL)-causal SNPs" challenge for identifying eQTLs and prioritizing their gene expression effects. eQTLs are genome sequence variants that result in gene expression changes and are thus prime suspects in the search for contributions to the causality of complex traits. When EnsembleExpr is trained on data from massively parallel reporter assays, it accurately predicts reporter expression levels from unseen regulatory sequences and identifies sequence variants that exhibit significant changes in reporter expression. Compared with other state-of-the-art methods, EnsembleExpr achieved competitive performance when applied on eQTL datasets determined by other protocols. We envision EnsembleExpr to be a resource to help interpret noncoding regulatory variants and prioritize disease-associated mutations for downstream validation.

Keywords: bioinformatics; eQTL analysis; genetics; machine learning; variation.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Computational Biology / methods*
  • Gene Expression Profiling / methods
  • Gene Expression Regulation
  • Genetic Predisposition to Disease
  • Humans
  • Models, Genetic
  • Mutation
  • Polymorphism, Single Nucleotide*
  • Quantitative Trait Loci*
  • Software