Boosting for tumor classification with gene expression data

Marcel Dettling; Peter Bühlmann

doi:10.1093/bioinformatics/btf867

Boosting for tumor classification with gene expression data

Bioinformatics. 2003 Jun 12;19(9):1061-9. doi: 10.1093/bioinformatics/btf867.

Authors

Marcel Dettling¹, Peter Bühlmann

Affiliation

¹ Seminar für Statistik, ETH Zürich, CH-8092, Switzerland. [email protected]

PMID: 12801866
DOI: 10.1093/bioinformatics/btf867

Abstract

Motivation: Microarray experiments generate large datasets with expression values for thousands of genes but not more than a few dozens of samples. Accurate supervised classification of tissue samples in such high-dimensional problems is difficult but often crucial for successful diagnosis and treatment. A promising way to meet this challenge is by using boosting in conjunction with decision trees.

Results: We demonstrate that the generic boosting algorithm needs some modification to become an accurate classifier in the context of gene expression data. In particular, we present a feature preselection method, a more robust boosting procedure and a new approach for multi-categorical problems. This allows for slight to drastic increase in performance and yields competitive results on several publicly available datasets.

Availability: Software for the modified boosting algorithms as well as for decision trees is available for free in R at http://stat.ethz.ch/~dettling/boosting.html.

Publication types

Comparative Study
Evaluation Study
Validation Study

MeSH terms

Algorithms*
Cluster Analysis*
Databases, Genetic
Decision Trees
Gene Expression Profiling / methods*
Gene Expression Regulation, Neoplastic / genetics*
Neoplasms / classification*
Neoplasms / genetics*
Oligonucleotide Array Sequence Analysis / methods*
Pattern Recognition, Automated*
Reproducibility of Results
Sensitivity and Specificity