Microarray analysis of gene expression: considerations in data mining and statistical treatment

Physiol Genomics. 2006 May 16;25(3):355-63. doi: 10.1152/physiolgenomics.00314.2004. Epub 2006 Mar 22.

Abstract

DNA microarray represents a powerful tool in biomedical discoveries. Harnessing the potential of this technology depends on the development and appropriate use of data mining and statistical tools. Significant current advances have made microarray data mining more versatile. Researchers are no longer limited to default choices that generate suboptimal results. Conflicting results in repeated experiments can be resolved through attention to the statistical details. In the current dynamic environment, there are many choices and potential pitfalls for researchers who intend to incorporate microarrays as a research tool. This review is intended to provide a simple framework to understand the choices and identify the pitfalls. Specifically, this review article discusses the choice of microarray platform, preprocessing raw data, differential expression and validation, clustering, annotation and functional characterization of genes, and pathway construction in light of emergent concepts and tools.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Review

MeSH terms

  • Cluster Analysis
  • Data Interpretation, Statistical*
  • Databases, Genetic
  • Gene Expression Profiling*
  • Information Storage and Retrieval
  • Oligonucleotide Array Sequence Analysis*
  • Reproducibility of Results
  • Software*