Q-nexus: a comprehensive and efficient analysis pipeline designed for ChIP-nexus

BMC Genomics. 2016 Nov 4;17(1):873. doi: 10.1186/s12864-016-3164-6.

Abstract

Background: ChIP-nexus, an extension of the ChIP-exo protocol, can be used to map the borders of protein-bound DNA sequences at nucleotide resolution, requires less input DNA and enables selective PCR duplicate removal using random barcodes. However, the use of random barcodes requires additional preprocessing of the mapping data, which complicates the computational analysis. To date, only a very limited number of software packages are available for the analysis of ChIP-exo data, which have not yet been systematically tested and compared on ChIP-nexus data.

Results: Here, we present a comprehensive software package for ChIP-nexus data that exploits the random barcodes for selective removal of PCR duplicates and for quality control. Furthermore, we developed bespoke methods to estimate the width of the protected region resulting from protein-DNA binding and to infer binding positions from ChIP-nexus data. Finally, we applied our peak calling method as well as the two other methods MACE and MACS2 to the available ChIP-nexus data.

Conclusions: The Q-nexus software is efficient and easy to use. Novel statistics about duplication rates in consideration of random barcodes are calculated. Our method for the estimation of the width of the protected region yields unbiased signatures that are highly reproducible for biological replicates and at the same time very specific for the respective factors analyzed. As judged by the irreproducible discovery rate (IDR), our peak calling algorithm shows a substantially better reproducibility. An implementation of Q-nexus is available at http://charite.github.io/Q/ .

Keywords: Algorithm; Bioinformatics; ChIP-exo; ChIP-nexus; Chromatin immunoprecipitation; Duplication rates; Library complexity.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Binding Sites
  • Chromatin Immunoprecipitation*
  • Computational Biology / methods*
  • DNA-Binding Proteins / metabolism
  • High-Throughput Nucleotide Sequencing*
  • Nucleotide Motifs
  • Protein Binding
  • Reproducibility of Results
  • Software*
  • Transcription Factors / metabolism

Substances

  • DNA-Binding Proteins
  • Transcription Factors