Identification of co-occurring transcription factor binding sites from DNA sequence using clustered position weight matrices

Nucleic Acids Res. 2012 Mar;40(5):e38. doi: 10.1093/nar/gkr1252. Epub 2011 Dec 19.

Abstract

Accurate prediction of transcription factor binding sites (TFBSs) is a prerequisite for identifying cis-regulatory modules that underlie transcriptional regulatory circuits encoded in the genome. Here, we present a computational framework for detecting TFBSs, when multiple position weight matrices (PWMs) for a transcription factor are available. Grouping multiple PWMs of a transcription factor (TF) based on their sequence similarity improves the specificity of TFBS prediction, which was evaluated using multiple genome-wide ChIP-Seq data sets from 26 TFs. The Z-scores of the area under a receiver operating characteristic curve (AUC) values of 368 TFs were calculated and used to statistically identify co-occurring regulatory motifs in the TF bound ChIP loci. Motifs that are co-occurring along with the empirical bindings of E2F, JUN or MYC have been evaluated, in the basal or stimulated condition. Results prove our method can be useful to systematically identify the co-occurring motifs of the TF for the given conditions.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Base Sequence
  • Binding Sites
  • Conserved Sequence
  • E2F Transcription Factors / metabolism
  • Nucleotide Motifs
  • Position-Specific Scoring Matrices*
  • Proto-Oncogene Proteins c-jun / metabolism
  • Regulatory Elements, Transcriptional*
  • Sequence Analysis, DNA*
  • Software
  • Transcription Factors / metabolism*

Substances

  • E2F Transcription Factors
  • Proto-Oncogene Proteins c-jun
  • Transcription Factors