Identifying longevity associated genes by integrating gene expression and curated annotations

PLoS Comput Biol. 2020 Nov 30;16(11):e1008429. doi: 10.1371/journal.pcbi.1008429. eCollection 2020 Nov.

Abstract

Aging is a complex process with poorly understood genetic mechanisms. Recent studies have sought to classify genes as pro-longevity or anti-longevity using a variety of machine learning algorithms. However, it is not clear which types of features are best for optimizing classification performance and which algorithms are best suited to this task. Further, performance assessments based on held-out test data are lacking. We systematically compare five popular classification algorithms using gene ontology and gene expression datasets as features to predict the pro-longevity versus anti-longevity status of genes for two model organisms (C. elegans and S. cerevisiae) using the GenAge database as ground truth. We find that elastic net penalized logistic regression performs particularly well at this task. Using elastic net, we make novel predictions of pro- and anti-longevity genes that are not currently in the GenAge database.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Caenorhabditis elegans / genetics
  • Gene Expression*
  • Gene Ontology*
  • Genes, Fungal
  • Longevity / genetics*
  • Machine Learning
  • Reproducibility of Results
  • Saccharomyces cerevisiae / genetics