PePr: a peak-calling prioritization pipeline to identify consistent or differential peaks from replicated ChIP-Seq data

Yanxiao Zhang; Yu-Hsuan Lin; Timothy D Johnson; Laura S Rozek; Maureen A Sartor

doi:10.1093/bioinformatics/btu372

PePr: a peak-calling prioritization pipeline to identify consistent or differential peaks from replicated ChIP-Seq data

Bioinformatics. 2014 Sep 15;30(18):2568-75. doi: 10.1093/bioinformatics/btu372. Epub 2014 Jun 3.

Authors

Yanxiao Zhang¹, Yu-Hsuan Lin¹, Timothy D Johnson¹, Laura S Rozek¹, Maureen A Sartor²

Affiliations

¹ Department of Computational Medicine and Bioinformatics, Department of Biostatistics and Department of Environmental Health Sciences, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.
² Department of Computational Medicine and Bioinformatics, Department of Biostatistics and Department of Environmental Health Sciences, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA Department of Computational Medicine and Bioinformatics, Department of Biostatistics and Department of Environmental Health Sciences, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.

Abstract

Motivation: ChIP-Seq is the standard method to identify genome-wide DNA-binding sites for transcription factors (TFs) and histone modifications. There is a growing need to analyze experiments with biological replicates, especially for epigenomic experiments where variation among biological samples can be substantial. However, tools that can perform group comparisons are currently lacking.

Results: We present a peak-calling prioritization pipeline (PePr) for identifying consistent or differential binding sites in ChIP-Seq experiments with biological replicates. PePr models read counts across the genome among biological samples with a negative binomial distribution and uses a local variance estimation method, ranking consistent or differential binding sites more favorably than sites with greater variability. We compared PePr with commonly used and recently proposed approaches on eight TF datasets and show that PePr uniquely identifies consistent regions with enriched read counts, high motif occurrence rate and known characteristics of TF binding based on visual inspection. For histone modification data with broadly enriched regions, PePr identified differential regions that are consistent within groups and outperformed other methods in scaling False Discovery Rate (FDR) analysis.

Availability and implementation: http://code.google.com/p/pepr-chip-seq/.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Algorithms*
Animals
Cell Line, Tumor
Chromatin Immunoprecipitation / methods*
Epigenomics
Genomics / methods*
High-Throughput Nucleotide Sequencing / methods*
Mice
Nucleotide Motifs
Oligonucleotide Array Sequence Analysis
Sequence Analysis, DNA
Transcription Factors / metabolism

Substances

Transcription Factors

Abstract

Publication types

MeSH terms

Substances

Grants and funding