A statistical perspective on gene expression data analysis

Stat Med. 2003 Feb 15;22(3):481-99. doi: 10.1002/sim.1350.

Abstract

Rapid advances in biotechnology have resulted in an increasing interest in the use of oligonucleotide and spotted cDNA gene expression microarrays for medical research. These arrays are being widely used to understand the underlying genetic structure of various diseases, with the ultimate goal to provide better diagnosis, prevention and cure. This technology allows for measurement of expression levels from several thousands of genes simultaneously, thus resulting in an enormous amount of data. The role of the statistician is critical to the successful design of gene expression studies, and the analysis and interpretation of the resulting voluminous data. This paper discusses hypotheses common to gene expression studies, and describes some of the statistical methods suitable for addressing these hypotheses. S-plus and SAS codes to perform the statistical methods are provided. Gene expression data from an unpublished oncologic study is used to illustrate these methods.

Publication types

  • Research Support, U.S. Gov't, P.H.S.
  • Review

MeSH terms

  • Cluster Analysis
  • DNA, Neoplasm / genetics
  • Data Interpretation, Statistical*
  • Factor Analysis, Statistical
  • Gene Expression Regulation*
  • Genomics / methods*
  • Humans
  • Oligonucleotide Array Sequence Analysis / methods*
  • Software

Substances

  • DNA, Neoplasm