TIP: a probabilistic method for identifying transcription factor target genes from ChIP-seq binding profiles

Chao Cheng; Renqiang Min; Mark Gerstein

doi:10.1093/bioinformatics/btr552

TIP: a probabilistic method for identifying transcription factor target genes from ChIP-seq binding profiles

Bioinformatics. 2011 Dec 1;27(23):3221-7. doi: 10.1093/bioinformatics/btr552. Epub 2011 Oct 29.

Authors

Chao Cheng¹, Renqiang Min, Mark Gerstein

Affiliation

¹ Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA.

Abstract

Motivation: ChIP-seq and ChIP-chip experiments have been widely used to identify transcription factor (TF) binding sites and target genes. Conventionally, a fairly 'simple' approach is employed for target gene identification e.g. finding genes with binding sites within 2 kb of a transcription start site (TSS). However, this does not take into account the number of sites upstream of the TSS, their exact positioning or the fact that different TFs appear to act at different characteristic distances from the TSS.

Results: Here we propose a probabilistic model called target identification from profiles (TIP) that quantitatively measures the regulatory relationships between TFs and target genes. For each TF, our model builds a characteristic, averaged profile of binding around the TSS and then uses this to weight the sites associated with a given gene, providing a continuous-valued 'regulatory' score relating each TF and potential target. Moreover, the score can readily be turned into a ranked list of target genes and an estimate of significance, which is useful for case-dependent downstream analysis.

Conclusion: We show the advantages of TIP by comparing it to the 'simple' approach on several representative datasets, using motif occurrence and relationship to knock-out experiments as metrics of validation. Moreover, we show that the probabilistic model is not as sensitive to various experimental parameters (including sequencing depth and peak-calling method) as the simple approach; in fact, the lesser dependence on sequencing depth potentially utilizes the result of a ChIP-seq experiment in a more 'cost-effective' manner.

Contact: [email protected]

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Amino Acid Motifs
Animals
Binding Sites
Chromatin Immunoprecipitation
Estrogen Receptor alpha / metabolism
Gene Expression Regulation
Mice
Models, Statistical*
Oligonucleotide Array Sequence Analysis
Protein Binding
STAT4 Transcription Factor / metabolism
Sequence Analysis, DNA
Transcription Factors / chemistry
Transcription Factors / genetics
Transcription Factors / metabolism*
Transcription Initiation Site

Substances

Estrogen Receptor alpha
STAT4 Transcription Factor
Stat4 protein, mouse
Transcription Factors