A data integration methodology for systems biology: experimental verification

Proc Natl Acad Sci U S A. 2005 Nov 29;102(48):17302-7. doi: 10.1073/pnas.0508649102. Epub 2005 Nov 21.

Abstract

The integration of data from multiple global assays is essential to understanding dynamic spatiotemporal interactions within cells. In a companion paper, we reported a data integration methodology, designated Pointillist, that can handle multiple data types from technologies with different noise characteristics. Here we demonstrate its application to the integration of 18 data sets relating to galactose utilization in yeast. These data include global changes in mRNA and protein abundance, genome-wide protein-DNA interaction data, database information, and computational predictions of protein-DNA and protein-protein interactions. We divided the integration task to determine three network components: key system elements (genes and proteins), protein-protein interactions, and protein-DNA interactions. Results indicate that the reconstructed network efficiently focuses on and recapitulates the known biology of galactose utilization. It also provided new insights, some of which were verified experimentally. The methodology described here, addresses a critical need across all domains of molecular and cell biology, to effectively integrate large and disparate data sets.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Chromatin Immunoprecipitation
  • Galactose / genetics*
  • Galactose / metabolism*
  • Informatics / methods*
  • Information Systems*
  • Microarray Analysis
  • Monosaccharide Transport Proteins / metabolism
  • Saccharomyces cerevisiae Proteins / metabolism
  • Software*
  • Systems Biology / methods*
  • Yeasts

Substances

  • GAL2 protein, S cerevisiae
  • HXT7 protein, S cerevisiae
  • Monosaccharide Transport Proteins
  • Saccharomyces cerevisiae Proteins
  • Galactose