Ensemble dependence model for classification and prediction of cancer and normal gene expression data

Bioinformatics. 2005 Jul 15;21(14):3114-21. doi: 10.1093/bioinformatics/bti483. Epub 2005 May 6.

Abstract

Motivation: DNA microarray technologies make it possible to simultaneously monitor thousands of genes' expression levels. A topic of great interest is to study the different expression profiles between microarray samples from cancer patients and normal subjects, by classifying them at gene expression levels. Currently, various clustering methods have been proposed in the literature to classify cancer and normal samples based on microarray data, and they are predominantly data-driven approaches. In this paper, we propose an alternative approach, a model-driven approach, which can reveal the relationship between the global gene expression profile and the subject's health status, and thus is promising in predicting the early development of cancer.

Results: In this work, we propose an ensemble dependence model, aimed at exploring the group dependence relationship of gene clusters. Under the framework of hypothesis-testing, we employ genes' dependence relationship as a feature to model and classify cancer and normal samples. The proposed classification scheme is applied to several real cancer datasets, including cDNA, Affymetrix microarray and proteomic data. It is noted that the proposed method yields very promising performance. We further investigate the eigenvalue pattern of the proposed method, and we discover different patterns between cancer and normal samples. Moreover, the transition between cancer and normal patterns suggests that the eigenvalue pattern of the proposed models may have potential to predict the early stage of cancer development. In addition, we examine the effects of possible model mismatch on the proposed scheme.

Publication types

  • Evaluation Study
  • Validation Study

MeSH terms

  • Algorithms*
  • Artificial Intelligence
  • Biomarkers, Tumor / analysis
  • Biomarkers, Tumor / metabolism*
  • Computer Simulation
  • Diagnosis, Computer-Assisted / methods*
  • Gene Expression Profiling / methods*
  • Humans
  • Models, Genetic
  • Neoplasm Proteins / analysis
  • Neoplasm Proteins / metabolism*
  • Neoplasms / diagnosis*
  • Neoplasms / genetics
  • Neoplasms / metabolism*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Pattern Recognition, Automated / methods
  • Reproducibility of Results
  • Sensitivity and Specificity

Substances

  • Biomarkers, Tumor
  • Neoplasm Proteins