A combinational feature selection and ensemble neural network method for classification of gene expression data

Bing Liu; Qinghua Cui; Tianzi Jiang; Songde Ma

doi:10.1186/1471-2105-5-136

A combinational feature selection and ensemble neural network method for classification of gene expression data

BMC Bioinformatics. 2004 Sep 27:5:136. doi: 10.1186/1471-2105-5-136.

Authors

Bing Liu¹, Qinghua Cui, Tianzi Jiang, Songde Ma

Affiliation

¹ National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100080, P. R. China. [email protected]

Abstract

Background: Microarray experiments are becoming a powerful tool for clinical diagnosis, as they have the potential to discover gene expression patterns that are characteristic for a particular disease. To date, this problem has received most attention in the context of cancer research, especially in tumor classification. Various feature selection methods and classifier design strategies also have been generally used and compared. However, most published articles on tumor classification have applied a certain technique to a certain dataset, and recently several researchers compared these techniques based on several public datasets. But, it has been verified that differently selected features reflect different aspects of the dataset and some selected features can obtain better solutions on some certain problems. At the same time, faced with a large amount of microarray data with little knowledge, it is difficult to find the intrinsic characteristics using traditional methods. In this paper, we attempt to introduce a combinational feature selection method in conjunction with ensemble neural networks to generally improve the accuracy and robustness of sample classification.

Results: We validate our new method on several recent publicly available datasets both with predictive accuracy of testing samples and through cross validation. Compared with the best performance of other current methods, remarkably improved results can be obtained using our new strategy on a wide range of different datasets.

Conclusions: Thus, we conclude that our methods can obtain more information in microarray data to get more accurate classification and also can help to extract the latent marker genes of the diseases for better diagnosis and treatment.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't
Validation Study

MeSH terms

Acute Disease
Artificial Intelligence
Colonic Neoplasms / classification
Colonic Neoplasms / genetics
Female
Gene Expression Profiling / classification*
Gene Expression Profiling / methods*
Gene Expression Regulation, Neoplastic / genetics*
Humans
Leukemia, Myeloid / classification
Leukemia, Myeloid / genetics
Lung Neoplasms / classification
Lung Neoplasms / genetics
Lymphoma, B-Cell / classification
Lymphoma, B-Cell / genetics
Lymphoma, Large B-Cell, Diffuse / classification
Lymphoma, Large B-Cell, Diffuse / genetics
Male
Neural Networks, Computer*
Oligonucleotide Array Sequence Analysis / classification*
Oligonucleotide Array Sequence Analysis / methods*
Ovarian Neoplasms / classification
Ovarian Neoplasms / genetics
Precursor Cell Lymphoblastic Leukemia-Lymphoma / classification
Precursor Cell Lymphoblastic Leukemia-Lymphoma / genetics
Predictive Value of Tests
Prostatic Neoplasms / classification
Prostatic Neoplasms / genetics