PePr: a peak-calling prioritization pipeline to identify consistent or differential peaks from replicated ChIP-Seq data

Bioinformatics. 2014 Sep 15;30(18):2568-75. doi: 10.1093/bioinformatics/btu372. Epub 2014 Jun 3.

Abstract

Motivation: ChIP-Seq is the standard method to identify genome-wide DNA-binding sites for transcription factors (TFs) and histone modifications. There is a growing need to analyze experiments with biological replicates, especially for epigenomic experiments where variation among biological samples can be substantial. However, tools that can perform group comparisons are currently lacking.

Results: We present a peak-calling prioritization pipeline (PePr) for identifying consistent or differential binding sites in ChIP-Seq experiments with biological replicates. PePr models read counts across the genome among biological samples with a negative binomial distribution and uses a local variance estimation method, ranking consistent or differential binding sites more favorably than sites with greater variability. We compared PePr with commonly used and recently proposed approaches on eight TF datasets and show that PePr uniquely identifies consistent regions with enriched read counts, high motif occurrence rate and known characteristics of TF binding based on visual inspection. For histone modification data with broadly enriched regions, PePr identified differential regions that are consistent within groups and outperformed other methods in scaling False Discovery Rate (FDR) analysis.

Availability and implementation: http://code.google.com/p/pepr-chip-seq/.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Animals
  • Cell Line, Tumor
  • Chromatin Immunoprecipitation / methods*
  • Epigenomics
  • Genomics / methods*
  • High-Throughput Nucleotide Sequencing / methods*
  • Mice
  • Nucleotide Motifs
  • Oligonucleotide Array Sequence Analysis
  • Sequence Analysis, DNA
  • Transcription Factors / metabolism

Substances

  • Transcription Factors