A deterministic motif finding algorithm with application to the human genome

Bioinformatics. 2006 May 1;22(9):1047-54. doi: 10.1093/bioinformatics/btl037. Epub 2006 Feb 2.

Abstract

We present a novel algorithm, MaMF, for identifying transcription factor (TF) binding site motifs. The method is deterministic and depends on an indexing technique to optimize the search process. On common yeast datasets, MaMF performs competitively with other methods. We also present results on a challenging group of eight sets of human genes known to be responsive to a diverse group of TFs. In every case, MaMF finds the annotated motif among the top scoring putative motifs. We compared MaMF against other motif finders on a larger human group of 21 gene sets and found that MaMF performs better than other algorithms. We analyzed the remaining high scoring motifs and show that many correspond to other TFs that are known to co-occur with the annotated TF motifs. The significant and frequent presence of co-occurring transcription factor binding sites explains in part the difficulty of human motif finding. MaMF is a very fast algorithm, suitable for application to large numbers of interesting gene sets.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Amino Acid Motifs
  • Base Sequence
  • Chromosome Mapping / methods*
  • Genome, Human
  • Humans
  • Molecular Sequence Data
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • Transcription Factors / chemistry*
  • Transcription Factors / genetics*

Substances

  • Transcription Factors