Robust singular value decomposition analysis of microarray data

Proc Natl Acad Sci U S A. 2003 Nov 11;100(23):13167-72. doi: 10.1073/pnas.1733249100. Epub 2003 Oct 27.

Abstract

In microarray data there are a number of biological samples, each assessed for the level of gene expression for a typically large number of genes. There is a need to examine these data with statistical techniques to help discern possible patterns in the data. Our technique applies a combination of mathematical and statistical methods to progressively take the data set apart so that different aspects can be examined for both general patterns and very specific effects. Unfortunately, these data tables are often corrupted with extreme values (outliers), missing values, and non-normal distributions that preclude standard analysis. We develop a robust analysis method to address these problems. The benefits of this robust analysis will be both the understanding of large-scale shifts in gene effects and the isolation of particular sample-by-gene effects that might be either unusual interactions or the result of experimental flaws. Our method requires a single pass and does not resort to complex "cleaning" or imputation of the data table before analysis. We illustrate the method with a commercial data set.

MeSH terms

  • Cluster Analysis
  • Databases, Genetic
  • Humans
  • Least-Squares Analysis
  • Models, Statistical
  • Models, Theoretical
  • Oligonucleotide Array Sequence Analysis / methods*
  • Statistics as Topic / methods*