Optimal aggregation of binary classifiers for multiclass cancer diagnosis using gene expression profiles

IEEE/ACM Trans Comput Biol Bioinform. 2009 Apr-Jun;6(2):333-43. doi: 10.1109/TCBB.2007.70239.

Abstract

Multiclass classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. There have been many studies of aggregating binary classifiers to construct a multiclass classifier based on one-versus-the-rest (1R), one-versus-one (11), or other coding strategies, as well as some comparison studies between them. However, the studies found that the best coding depends on each situation. Therefore, a new problem, which we call the "optimal coding problem," has arisen: how can we determine which coding is the optimal one in each situation? To approach this optimal coding problem, we propose a novel framework for constructing a multiclass classifier, in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. Although there is no a priori answer to the optimal coding problem, our weight tuning method can be a consistent answer to the problem. We apply this method to various classification problems including a synthesized data set and some cancer diagnosis data sets from gene expression profiling. The results demonstrate that, in most situations, our method can improve classification accuracy over simple voting heuristics and is better than or comparable to state-of-the-art multiclass predictors.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Artificial Intelligence
  • Bayes Theorem
  • Computer Simulation
  • Esophageal Neoplasms / classification
  • Esophageal Neoplasms / diagnosis
  • Esophageal Neoplasms / genetics
  • Gene Expression Profiling*
  • Humans
  • Leukemia / classification
  • Leukemia / diagnosis
  • Leukemia / genetics
  • Models, Statistical*
  • Neoplasms / classification
  • Neoplasms / diagnosis*
  • Neoplasms / genetics
  • Reproducibility of Results
  • Thyroid Neoplasms / classification
  • Thyroid Neoplasms / diagnosis
  • Thyroid Neoplasms / genetics