Improved bolstering error estimation for gene ranking

Kiet N T Huynh; John H Phan; Tan M Vo; May D Wang

doi:10.1109/IEMBS.2007.4353372

Improved bolstering error estimation for gene ranking

Annu Int Conf IEEE Eng Med Biol Soc. 2007:2007:4633-6. doi: 10.1109/IEMBS.2007.4353372.

Authors

Kiet N T Huynh¹, John H Phan, Tan M Vo, May D Wang

Affiliation

¹ Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, 313 Ferst Drive, Atlanta, GA 30332, USA.

PMID: 18003038
DOI: 10.1109/IEMBS.2007.4353372

Abstract

Many methods have been proposed to identify differentially expressed genes in diseased tissues. The performance of the method is closely related to the evaluation metric. We examine several error estimation algorithms (i.e., cross validation, bootstrap, resubstitution, and resubstitution with bolstering) for three classifiers (i.e., support vector machine, Fisher's discriminant, and signed distance function). To control the classifier's data-overfitting problem, usually caused by small sample size for many real datasets, we generate synthetic datasets based on real data. This way, we can monitor sample size impact when evaluating the metrics. We find that resubstitution with bolstering has the best result, especially with respect to computational efficiency. However, classical bolstering tends to bias in high dimensions. Thus, we further investigate ways to reduce bolstering estimation bias without increasing computational intensity. Results of our investigation indicate that the estimator tends to become unbiased as the sample size increases. We also find that modified bolstering is the best among all metrics in terms of estimation accuracy and computational efficiency.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Animals
Computer Simulation
Gene Expression Profiling / methods*
Gene Expression Regulation*
Humans
Oligonucleotide Array Sequence Analysis / methods*
Selection Bias
Sensitivity and Specificity
Software*

Abstract

Publication types

MeSH terms

Grants and funding