Biomarker selection and classification of "-omics" data using a two-step bayes classification framework

Biomed Res Int. 2013:2013:148014. doi: 10.1155/2013/148014. Epub 2013 Sep 11.

Abstract

Identification of suitable biomarkers for accurate prediction of phenotypic outcomes is a goal for personalized medicine. However, current machine learning approaches are either too complex or perform poorly. Here, a novel two-step machine-learning framework is presented to address this need. First, a Naïve Bayes estimator is used to rank features from which the top-ranked will most likely contain the most informative features for prediction of the underlying biological classes. The top-ranked features are then used in a Hidden Naïve Bayes classifier to construct a classification prediction model from these filtered attributes. In order to obtain the minimum set of the most informative biomarkers, the bottom-ranked features are successively removed from the Naïve Bayes-filtered feature list one at a time, and the classification accuracy of the Hidden Naïve Bayes classifier is checked for each pruned feature set. The performance of the proposed two-step Bayes classification framework was tested on different types of -omics datasets including gene expression microarray, single nucleotide polymorphism microarray (SNParray), and surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) proteomic data. The proposed two-step Bayes classification framework was equal to and, in some cases, outperformed other classification methods in terms of prediction accuracy, minimum number of classification markers, and computational time.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Bayes Theorem*
  • Biomarkers
  • Gene Expression Profiling
  • Humans
  • Microarray Analysis / statistics & numerical data*
  • Models, Statistical
  • Oligonucleotide Array Sequence Analysis
  • Polymorphism, Single Nucleotide
  • Proteomics / methods
  • Proteomics / statistics & numerical data*

Substances

  • Biomarkers