A methodology for motif discovery employing iterated cluster re-assignment

Comput Syst Bioinformatics Conf. 2006:257-68.

Abstract

Motif discovery is a crucial part of regulatory network identification, and therefore widely studied in the literature. Motif discovery programs search for statistically significant, well-conserved and over-represented patterns in given promoter sequences. When gene expression data is available, there are mainly three paradigms for motif discovery; cluster-first, regression, and joint probabilistic. The success of motif discovery depends highly on the homogeneity of input sequences, regardless of paradigm employed. In this work, we propose a methodology for getting homogeneous subsets from input sequences for increased motif discovery performance. It is a unification of cluster-first and regression paradigms based on iterative cluster re-assignment. The experimental results show the effectiveness of the methodology.

MeSH terms

  • Amino Acid Motifs
  • Cluster Analysis
  • Computational Biology / methods*
  • Models, Genetic
  • Models, Statistical
  • Multigene Family
  • Probability
  • Protein Conformation
  • Protein Structure, Tertiary
  • Proteomics / methods
  • Regression Analysis
  • Saccharomyces cerevisiae / metabolism
  • Transcription Factors / chemistry

Substances

  • Transcription Factors