Motivation: ChIP-seq data are enriched in binding sites for the protein immunoprecipitated. Some sequences may also contain binding sites for a coregulator. Biologists are interested in knowing which coregulatory factor motifs may be present in the sequences bound by the protein ChIP'ed.
Results: We present a finite mixture framework with an expectation-maximization algorithm that considers two motifs jointly and simultaneously determines which sequences contain both motifs, either one or neither of them. Tested on 10 simulated ChIP-seq datasets, our method performed better than repeated application of MEME in predicting sequences containing both motifs. When applied to a mouse liver Foxa2 ChIP-seq dataset involving ~ 12 000 400-bp sequences, coMOTIF identified co-occurrence of Foxa2 with Hnf4a, Cebpa, E-box, Ap1/Maf or Sp1 motifs in ~6-33% of these sequences. These motifs are either known as liver-specific transcription factors or have an important role in liver function.
Availability: Freely available at http://www.niehs.nih.gov/research/resources/software/comotif/.
Contact: [email protected]
Supplementary information: Supplementary data are available at Bioinformatics online.