Systematic evaluation of the impact of ChIP-seq read designs on genome coverage, peak identification, and allele-specific binding detection

Qi Zhang; Xin Zeng; Sam Younkin; Trupti Kawli; Michael P Snyder; Sündüz Keleş

doi:10.1186/s12859-016-0957-1

Systematic evaluation of the impact of ChIP-seq read designs on genome coverage, peak identification, and allele-specific binding detection

BMC Bioinformatics. 2016 Feb 24:17:96. doi: 10.1186/s12859-016-0957-1.

Authors

Qi Zhang¹, Xin Zeng², Sam Younkin³, Trupti Kawli⁴, Michael P Snyder^{5

6}, Sündüz Keleş^{7

8}

Affiliations

¹ Department of Statistics, University of Nebraska Lincoln, Lincoln, Nebraska, USA. [email protected].
² Department of Statistics, University of Wisconsin Madison, Madison, Wisconsin, USA.
³ Department of Biostatistics and Medical Informatics, University of Wisconsin Madison, Madison, Wisconsin, USA.
⁴ Department of Genetics, Stanford University School of Medicine, Palo Alto, California, USA. [email protected].
⁵ Department of Genetics, Stanford University School of Medicine, Palo Alto, California, USA. [email protected].
⁶ Stanford Center for Genomics and Personalized Medicine, Palo Alto, California, USA. [email protected].
⁷ Department of Statistics, University of Wisconsin Madison, Madison, Wisconsin, USA. [email protected].
⁸ Department of Biostatistics and Medical Informatics, University of Wisconsin Madison, Madison, Wisconsin, USA. [email protected].

Abstract

Background: Chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiments revolutionized genome-wide profiling of transcription factors and histone modifications. Although maturing sequencing technologies allow these experiments to be carried out with short (36-50 bps), long (75-100 bps), single-end, or paired-end reads, the impact of these read parameters on the downstream data analysis are not well understood. In this paper, we evaluate the effects of different read parameters on genome sequence alignment, coverage of different classes of genomic features, peak identification, and allele-specific binding detection.

Results: We generated 101 bps paired-end ChIP-seq data for many transcription factors from human GM12878 and MCF7 cell lines. Systematic evaluations using in silico variations of these data as well as fully simulated data, revealed complex interplay between the sequencing parameters and analysis tools, and indicated clear advantages of paired-end designs in several aspects such as alignment accuracy, peak resolution, and most notably, allele-specific binding detection.

Conclusions: Our work elucidates the effect of design on the downstream analysis and provides insights to investigators in deciding sequencing parameters in ChIP-seq experiments. We present the first systematic evaluation of the impact of ChIP-seq designs on allele-specific binding detection and highlights the power of pair-end designs in such studies.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Alleles*
Chromatin Immunoprecipitation / methods*
Genomics / methods*
Humans
Sequence Alignment
Transcription Factors / metabolism

Substances

Transcription Factors

Abstract

Publication types

MeSH terms

Substances

Grants and funding