A highly efficient and effective motif discovery method for ChIP-seq/ChIP-chip data using positional information

Xiaotu Ma; Ashwinikumar Kulkarni; Zhihua Zhang; Zhenyu Xuan; Robert Serfling; Michael Q Zhang

doi:10.1093/nar/gkr1135

A highly efficient and effective motif discovery method for ChIP-seq/ChIP-chip data using positional information

Nucleic Acids Res. 2012 Apr;40(7):e50. doi: 10.1093/nar/gkr1135. Epub 2012 Jan 6.

Authors

Xiaotu Ma¹, Ashwinikumar Kulkarni, Zhihua Zhang, Zhenyu Xuan, Robert Serfling, Michael Q Zhang

Affiliation

¹ Department of Molecular and Cell Biology, Center for Systems Biology, University of Texas at Dallas, 800 W. Campbell Road, Richardson, TX 75080, USA.

Abstract

Identification of DNA motifs from ChIP-seq/ChIP-chip [chromatin immunoprecipitation (ChIP)] data is a powerful method for understanding the transcriptional regulatory network. However, most established methods are designed for small sample sizes and are inefficient for ChIP data. Here we propose a new k-mer occurrence model to reflect the fact that functional DNA k-mers often cluster around ChIP peak summits. With this model, we introduced a new measure to discover functional k-mers. Using simulation, we demonstrated that our method is more robust against noises in ChIP data than available methods. A novel word clustering method is also implemented to group similar k-mers into position weight matrices (PWMs). Our method was applied to a diverse set of ChIP experiments to demonstrate its high sensitivity and specificity. Importantly, our method is much faster than several other methods for large sample sizes. Thus, we have developed an efficient and effective motif discovery method for ChIP experiments.

Publication types

Evaluation Study
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Animals
Binding Sites
CCCTC-Binding Factor
Chromatin Immunoprecipitation*
Cluster Analysis
Computer Simulation
Drosophila melanogaster / genetics
Embryonic Stem Cells / metabolism
Gene Regulatory Networks
High-Throughput Nucleotide Sequencing
Mice
Nucleotide Motifs
Oligonucleotide Array Sequence Analysis
Regulatory Elements, Transcriptional*
Repressor Proteins
Sequence Analysis, DNA
Software*
Transcription Factors / metabolism*

Substances

CCCTC-Binding Factor
Ctcf protein, mouse
Repressor Proteins
Transcription Factors

Grants and funding

HG001696/HG/NHGRI NIH HHS/United States