Combining heterogenous data for prediction of disease related and pharmacogenes

Pac Symp Biocomput. 2014:328-39.

Abstract

Identifying genetic variants that affect drug response or play a role in disease is an important task for clinicians and researchers. Before individual variants can be explored efficiently for effect on drug response or disease relationships, specific candidate genes must be identified. While many methods rank candidate genes through the use of sequence features and network topology, only a few exploit the information contained in the biomedical literature. In this work, we train and test a classifier on known pharmacogenes from PharmGKB and present a classifier that predicts pharmacogenes on a genome-wide scale using only Gene Ontology annotations and simple features mined from the biomedical literature. Performance of F=0.86, AUC=0.860 is achieved. The top 10 predicted genes are analyzed. Additionally, a set of enriched pharmacogenic Gene Ontology concepts is produced.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Artificial Intelligence
  • Computational Biology
  • Data Mining / statistics & numerical data
  • Databases, Genetic
  • Databases, Pharmaceutical
  • Gene Ontology / statistics & numerical data
  • Genetic Variation
  • Humans
  • Knowledge Bases
  • Natural Language Processing
  • Pharmacogenetics / statistics & numerical data*