Epigenetic priors for identifying active transcription factor binding sites

Gabriel Cuellar-Partida; Fabian A Buske; Robert C McLeay; Tom Whitington; William Stafford Noble; Timothy L Bailey

doi:10.1093/bioinformatics/btr614

Epigenetic priors for identifying active transcription factor binding sites

Bioinformatics. 2012 Jan 1;28(1):56-62. doi: 10.1093/bioinformatics/btr614. Epub 2011 Nov 8.

Authors

Gabriel Cuellar-Partida¹, Fabian A Buske, Robert C McLeay, Tom Whitington, William Stafford Noble, Timothy L Bailey

Affiliation

¹ Institute for Molecular Bioscience, The University of Queensland, Brisbane QLD 4072, Australia.

Abstract

Motivation: Accurate knowledge of the genome-wide binding of transcription factors in a particular cell type or under a particular condition is necessary for understanding transcriptional regulation. Using epigenetic data such as histone modification and DNase I, accessibility data has been shown to improve motif-based in silico methods for predicting such binding, but this approach has not yet been fully explored.

Results: We describe a probabilistic method for combining one or more tracks of epigenetic data with a standard DNA sequence motif model to improve our ability to identify active transcription factor binding sites (TFBSs). We convert each data type into a position-specific probabilistic prior and combine these priors with a traditional probabilistic motif model to compute a log-posterior odds score. Our experiments, using histone modifications H3K4me1, H3K4me3, H3K9ac and H3K27ac, as well as DNase I sensitivity, show conclusively that the log-posterior odds score consistently outperforms a simple binary filter based on the same data. We also show that our approach performs competitively with a more complex method, CENTIPEDE, and suggest that the relative simplicity of the log-posterior odds scoring method makes it an appealing and very general method for identifying functional TFBSs on the basis of DNA and epigenetic evidence.

Availability and implementation: FIMO, part of the MEME Suite software toolkit, now supports log-posterior odds scoring using position-specific priors for motif search. A web server and source code are available at http://meme.nbcr.net. Utilities for creating priors are at http://research.imb.uq.edu.au/t.bailey/SD/Cuellar2011.

Contact: [email protected]

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Animals
DNA / chemistry
DNA / metabolism
Epigenomics*
Gene Expression Regulation
Histone Code*
Humans
Models, Statistical*
Nucleotide Motifs
Sequence Analysis, DNA
Software*
Transcription Factors / chemistry
Transcription Factors / metabolism*

Substances

Transcription Factors
DNA

Abstract

Publication types

MeSH terms

Substances

Grants and funding