Variable selection and pattern recognition with gene expression data generated by the microarray technology

Math Biosci. 2002 Mar;176(1):71-98. doi: 10.1016/s0025-5564(01)00103-1.

Abstract

Lack of adequate statistical methods for the analysis of microarray data remains the most critical deterrent to uncovering the true potential of these promising techniques in basic and translational biological studies. The popular practice of drawing important biological conclusions from just one replicate (slide) should be discouraged. In this paper, we discuss some modern trends in statistical analysis of microarray data with a special focus on statistical classification (pattern recognition) and variable selection. In addressing these issues we consider the utility of some distances between random vectors and their nonparametric estimates obtained from gene expression data. Performance of the proposed distances is tested by computer simulations and analysis of gene expression data on two different types of human leukemia. In experimental settings, the error rate is estimated by cross-validation, while a control sample is generated in computer simulation experiments aimed at testing the proposed gene selection procedures and associated classification rules.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Computer Simulation
  • Gene Expression Profiling / methods*
  • Humans
  • Leukemia, Myeloid, Acute / genetics
  • Oligonucleotide Array Sequence Analysis / methods*
  • Pattern Recognition, Automated*
  • Precursor Cell Lymphoblastic Leukemia-Lymphoma / genetics