Optimal aggregation of binary classifiers for multiclass cancer diagnosis using gene expression profiles

Naoto Yukinawa; Shigeyuki Oba; Kikuya Kato; Shin Ishii

doi:10.1109/TCBB.2007.70239

Optimal aggregation of binary classifiers for multiclass cancer diagnosis using gene expression profiles

IEEE/ACM Trans Comput Biol Bioinform. 2009 Apr-Jun;6(2):333-43. doi: 10.1109/TCBB.2007.70239.

Authors

Naoto Yukinawa¹, Shigeyuki Oba, Kikuya Kato, Shin Ishii

Affiliation

¹ Graduate School of Information Sciences, Nara Institute of Science and Technology, Ikoma, Nara, Japan. [email protected]

PMID: 19407356
DOI: 10.1109/TCBB.2007.70239

Abstract

Multiclass classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. There have been many studies of aggregating binary classifiers to construct a multiclass classifier based on one-versus-the-rest (1R), one-versus-one (11), or other coding strategies, as well as some comparison studies between them. However, the studies found that the best coding depends on each situation. Therefore, a new problem, which we call the "optimal coding problem," has arisen: how can we determine which coding is the optimal one in each situation? To approach this optimal coding problem, we propose a novel framework for constructing a multiclass classifier, in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. Although there is no a priori answer to the optimal coding problem, our weight tuning method can be a consistent answer to the problem. We apply this method to various classification problems including a synthesized data set and some cancer diagnosis data sets from gene expression profiling. The results demonstrate that, in most situations, our method can improve classification accuracy over simple voting heuristics and is better than or comparable to state-of-the-art multiclass predictors.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Artificial Intelligence
Bayes Theorem
Computer Simulation
Esophageal Neoplasms / classification
Esophageal Neoplasms / diagnosis
Esophageal Neoplasms / genetics
Gene Expression Profiling*
Humans
Leukemia / classification
Leukemia / diagnosis
Leukemia / genetics
Models, Statistical*
Neoplasms / classification
Neoplasms / diagnosis*
Neoplasms / genetics
Reproducibility of Results
Thyroid Neoplasms / classification
Thyroid Neoplasms / diagnosis
Thyroid Neoplasms / genetics