Computational analysis and modeling of genome-scale avidity distribution of transcription factor binding sites in chip-pet experiments

Genome Inform. 2007:19:83-94.

Abstract

Advances in high-throughput technologies, such as ChIP-chip and ChIP-PET (Chromatin Immuno-Precipitation Paired-End diTag), and the availability of human and mouse genome sequences now allow us to identify transcription factor binding sites (TFBS) and analyze mechanisms of gene regulation on the level of the entire genome. Here, we have developed a computational approach which uses ChIP-PET data and statistical modeling to assess experimental noise and identify reliable TFBS for c-Myc, STAT1 and p53 transcription factors in the human genome. We propose a mixture probabilistic model and develop computational programs for Monte Carlo simulation of ChIP-PET data to define the background noise of the sequence clustering and to identify the probability function of specific DNA-protein binding in the eukaryotic genome. Our approach demonstrates high reproducibility of the method and not only distinguishes bona fide TFBSs from non-specific TFBSs with a high specificity, but also provides algorithmic and computational basis for further optimization of experimental parameters of the ChIP-PET method.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Binding Sites
  • Chromatin Immunoprecipitation
  • Cluster Analysis
  • Computational Biology / methods*
  • Computer Simulation
  • Gene Expression Regulation*
  • Gene Library
  • Genome
  • Humans
  • Models, Theoretical
  • Monte Carlo Method
  • Software
  • Transcription Factors / metabolism*

Substances

  • Transcription Factors