Deriving transcriptional programs and functional processes from gene expression databases

Bioinformatics. 2012 Apr 15;28(8):1122-9. doi: 10.1093/bioinformatics/bts112. Epub 2012 Mar 8.

Abstract

Motivation: A system-wide approach to revealing the underlying molecular state of a cell is a long-standing biological challenge. Developed over the last decade, gene expression profiles possess the characteristics of such an assay. They have the capacity to reveal both underlying molecular events as well as broader phenotypes such as clinical outcomes. To interpret these profiles, many gene sets have been developed that characterize biological processes. However, the full potential of these gene sets has not yet been achieved. Since the advent of gene expression databases, many have posited that they can reveal properties of activities that are not evident from individual datasets, analogous to how the expression of a single gene generally cannot reveal the activation of a biological process.

Results: To address this issue, we have developed a high-throughput method to mine gene expression databases for the regulation of gene sets. Given a set of genes, we scored it against each gene expression dataset by looking for enrichment of co-regulated genes relative to an empirical null distribution. After validating the method, we applied it to address two biological problems. First, we deciphered the E2F transcriptional network. We confirmed that true transcriptional targets exhibit a distinct regulatory profile across a database. Second, we leveraged the patterns of regulation across a database of gene sets to produce an automatically generated catalog of biological processes. These demonstrations revealed the power of a global analysis of the data contained within gene expression databases, and the potential for using them to address biological questions.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Breast Neoplasms / genetics
  • Breast Neoplasms / pathology
  • Cell Cycle
  • Data Mining*
  • Databases, Genetic*
  • E2F Transcription Factors / metabolism
  • Gene Expression Profiling / methods*
  • Gene Regulatory Networks*
  • Humans

Substances

  • E2F Transcription Factors