A novel statistical method for quantitative comparison of multiple ChIP-seq datasets

Bioinformatics. 2015 Jun 15;31(12):1889-96. doi: 10.1093/bioinformatics/btv094. Epub 2015 Feb 13.

Abstract

Motivation: ChIP-seq is a powerful technology to measure the protein binding or histone modification strength in the whole genome scale. Although there are a number of methods available for single ChIP-seq data analysis (e.g. 'peak detection'), rigorous statistical method for quantitative comparison of multiple ChIP-seq datasets with the considerations of data from control experiment, signal to noise ratios, biological variations and multiple-factor experimental designs is under-developed.

Results: In this work, we develop a statistical method to perform quantitative comparison of multiple ChIP-seq datasets and detect genomic regions showing differential protein binding or histone modification. We first detect peaks from all datasets and then union them to form a single set of candidate regions. The read counts from IP experiment at the candidate regions are assumed to follow Poisson distribution. The underlying Poisson rates are modeled as an experiment-specific function of artifacts and biological signals. We then obtain the estimated biological signals and compare them through the hypothesis testing procedure in a linear model framework. Simulations and real data analyses demonstrate that the proposed method provides more accurate and robust results compared with existing ones.

Availability and implementation: An R software package ChIPComp is freely available at http://web1.sph.emory.edu/users/hwu30/software/ChIPComp.html.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Artifacts
  • Chromatin Immunoprecipitation / methods*
  • Computational Biology / methods*
  • Computer Simulation
  • Datasets as Topic
  • Genome, Human*
  • High-Throughput Nucleotide Sequencing / methods
  • Histones / metabolism*
  • Humans
  • Models, Statistical*
  • Poisson Distribution
  • Protein Binding
  • Sequence Analysis, DNA / methods
  • Software*
  • Transcription Factors / genetics
  • Transcription Factors / metabolism*

Substances

  • Histones
  • Transcription Factors