A methodology for motif discovery employing iterated cluster re-assignment

Osman Abul; Finn Drabløs; Geir Kjetil Sandve

A methodology for motif discovery employing iterated cluster re-assignment

Comput Syst Bioinformatics Conf. 2006:257-68.

Authors

Osman Abul¹, Finn Drabløs, Geir Kjetil Sandve

Affiliation

¹ Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway. [email protected]

PMID: 17369644

Abstract

Motif discovery is a crucial part of regulatory network identification, and therefore widely studied in the literature. Motif discovery programs search for statistically significant, well-conserved and over-represented patterns in given promoter sequences. When gene expression data is available, there are mainly three paradigms for motif discovery; cluster-first, regression, and joint probabilistic. The success of motif discovery depends highly on the homogeneity of input sequences, regardless of paradigm employed. In this work, we propose a methodology for getting homogeneous subsets from input sequences for increased motif discovery performance. It is a unification of cluster-first and regression paradigms based on iterative cluster re-assignment. The experimental results show the effectiveness of the methodology.

MeSH terms

Amino Acid Motifs
Cluster Analysis
Computational Biology / methods*
Models, Genetic
Models, Statistical
Multigene Family
Probability
Protein Conformation
Protein Structure, Tertiary
Proteomics / methods
Regression Analysis
Saccharomyces cerevisiae / metabolism
Transcription Factors / chemistry

Substances

Transcription Factors