A computational genomics approach to identify cis-regulatory modules from chromatin immunoprecipitation microarray data--a case study using E2F1

Genome Res. 2006 Dec;16(12):1585-95. doi: 10.1101/gr.5520206. Epub 2006 Oct 19.

Abstract

Advances in high-throughput technologies, such as ChIP-chip, and the completion of human and mouse genomic sequences now allow analysis of the mechanisms of gene regulation on a systems level. In this study, we have developed a computational genomics approach (termed ChIPModules), which begins with experimentally determined binding sites and integrates positional weight matrices constructed from transcription factor binding sites, a comparative genomics approach, and statistical learning methods to identify transcriptional regulatory modules. We began with E2F1 binding site information obtained from ChIP-chip analyses of ENCODE regions, from both HeLa and MCF7 cells. Our approach not only distinguished targets from nontargets with a high specificity, but it also identified five regulatory modules for E2F1. One of the identified modules predicted a colocalization of E2F1 and AP-2alpha on a set of target promoters with an intersite distance of <270 bp. We tested this prediction using ChIP-chip assays with arrays containing approximately 14,000 human promoters. We found that both E2F1 and AP-2alpha bind within the predicted distance to a large number of human promoters, demonstrating the strength of our sequence-based, unbiased, and universal protocol. Finally, we have used our ChIPModules approach to develop a database that includes thousands of computationally identified and/or experimentally verified E2F1 target promoters.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Base Pairing
  • Binding Sites
  • Breast Neoplasms / pathology
  • Cell Line, Tumor
  • Chromatin Immunoprecipitation*
  • Computational Biology*
  • E2F1 Transcription Factor / genetics
  • E2F1 Transcription Factor / metabolism*
  • Genomics*
  • HeLa Cells
  • Humans
  • Models, Genetic
  • Oligonucleotide Array Sequence Analysis*
  • Promoter Regions, Genetic
  • ROC Curve
  • Regulatory Sequences, Nucleic Acid*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Software

Substances

  • E2F1 Transcription Factor

Associated data

  • GEO/GPL3930
  • GEO/GSE5174
  • GEO/GSM116738
  • GEO/GSM116739
  • GEO/GSM116740
  • GEO/GSM116741
  • GEO/GSM116742