Subsystem identification through dimensionality reduction of large-scale gene expression data

Genome Res. 2003 Jul;13(7):1706-18. doi: 10.1101/gr.903503.

Abstract

The availability of parallel, high-throughput biological experiments that simultaneously monitor thousands of cellular observables provides an opportunity for investigating cellular behavior in a highly quantitative manner at multiple levels of resolution. One challenge to more fully exploit new experimental advances is the need to develop algorithms to provide an analysis at each of the relevant levels of detail. Here, the data analysis method non-negative matrix factorization (NMF) has been applied to the analysis of gene array experiments. Whereas current algorithms identify relationships on the basis of large-scale similarity between expression patterns, NMF is a recently developed machine learning technique capable of recognizing similarity between subportions of the data corresponding to localized features in expression space. A large data set consisting of 300 genome-wide expression measurements of yeast was used as sample data to illustrate the performance of the new approach. Local features detected are shown to map well to functional cellular subsystems. Functional relationships predicted by the new analysis are compared with those predicted using standard approaches; validation using bioinformatic databases suggests predictions using the new approach may be up to twice as accurate as some conventional approaches.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.
  • Validation Study

MeSH terms

  • Algorithms*
  • Computational Biology / methods
  • Computational Biology / statistics & numerical data
  • Databases, Protein / statistics & numerical data
  • Gene Expression Profiling / statistics & numerical data*
  • Gene Expression Profiling / trends*
  • Genes, Fungal / genetics
  • Genes, Fungal / physiology
  • Genome, Fungal
  • Predictive Value of Tests
  • Proteome / classification
  • Proteome / genetics
  • Proteome / physiology
  • Saccharomyces cerevisiae Proteins / classification
  • Saccharomyces cerevisiae Proteins / genetics
  • Saccharomyces cerevisiae Proteins / physiology

Substances

  • Proteome
  • Saccharomyces cerevisiae Proteins