CMA: a comprehensive Bioconductor package for supervised classification with high dimensional data

M Slawski; M Daumer; A-L Boulesteix

doi:10.1186/1471-2105-9-439

CMA: a comprehensive Bioconductor package for supervised classification with high dimensional data

BMC Bioinformatics. 2008 Oct 16:9:439. doi: 10.1186/1471-2105-9-439.

Authors

M Slawski¹, M Daumer, A-L Boulesteix

Affiliation

¹ Sylvia Lawry Centre for Multiple Sclerosis Research, Munich, Germany. [email protected]

Abstract

Background: For the last eight years, microarray-based classification has been a major topic in statistics, bioinformatics and biomedicine research. Traditional methods often yield unsatisfactory results or may even be inapplicable in the so-called "p >> n" setting where the number of predictors p by far exceeds the number of observations n, hence the term "ill-posed-problem". Careful model selection and evaluation satisfying accepted good-practice standards is a very complex task for statisticians without experience in this area or for scientists with limited statistical background. The multiplicity of available methods for class prediction based on high-dimensional data is an additional practical challenge for inexperienced researchers.

Results: In this article, we introduce a new Bioconductor package called CMA (standing for "Classification for MicroArrays") for automatically performing variable selection, parameter tuning, classifier construction, and unbiased evaluation of the constructed classifiers using a large number of usual methods. Without much time and effort, users are provided with an overview of the unbiased accuracy of most top-performing classifiers. Furthermore, the standardized evaluation framework underlying CMA can also be beneficial in statistical research for comparison purposes, for instance if a new classifier has to be compared to existing approaches.

Conclusion: CMA is a user-friendly comprehensive package for classifier construction and evaluation implementing most usual approaches. It is freely available from the Bioconductor website at (http://bioconductor.org/packages/2.3/bioc/html/CMA.html).

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Area Under Curve
Computational Biology / methods*
Computer Simulation
Discriminant Analysis
Internet
Least-Squares Analysis
Logistic Models
Microarray Analysis*
Models, Statistical
Monte Carlo Method
Neural Networks, Computer
Reproducibility of Results
Sensitivity and Specificity
Software*
Statistics, Nonparametric
User-Computer Interface