The Triform algorithm: improved sensitivity and specificity in ChIP-Seq peak finding

Karl Kornacker; Morten Beck Rye; Tony Håndstad; Finn Drabløs

doi:10.1186/1471-2105-13-176

The Triform algorithm: improved sensitivity and specificity in ChIP-Seq peak finding

BMC Bioinformatics. 2012 Jul 24:13:176. doi: 10.1186/1471-2105-13-176.

Authors

Karl Kornacker¹, Morten Beck Rye, Tony Håndstad, Finn Drabløs

Affiliation

¹ Division of Sensory Biophysics, Ohio State University, Columbus, OH, USA.

Abstract

Background: Chromatin immunoprecipitation combined with high-throughput sequencing (ChIP-Seq) is the most frequently used method to identify the binding sites of transcription factors. Active binding sites can be seen as peaks in enrichment profiles when the sequencing reads are mapped to a reference genome. However, the profiles are normally noisy, making it challenging to identify all significantly enriched regions in a reliable way and with an acceptable false discovery rate.

Results: We present the Triform algorithm, an improved approach to automatic peak finding in ChIP-Seq enrichment profiles for transcription factors. The method uses model-free statistics to identify peak-like distributions of sequencing reads, taking advantage of improved peak definition in combination with known characteristics of ChIP-Seq data.

Conclusions: Triform outperforms several existing methods in the identification of representative peak profiles in curated benchmark data sets. We also show that Triform in many cases is able to identify peaks that are more consistent with biological function, compared with other methods. Finally, we show that Triform can be used to generate novel information on transcription factor binding in repeat regions, which represents a particular challenge in many ChIP-Seq experiments. The Triform algorithm has been implemented in R, and is available via http://tare.medisin.ntnu.no/triform.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Binding Sites
Chromatin Immunoprecipitation / methods*
High-Throughput Nucleotide Sequencing*
Sensitivity and Specificity
Sequence Analysis, DNA*
Transcription Factors / metabolism*

Substances

Transcription Factors