Data mining for regulatory elements in yeast genome

A Brazma; J Vilo; E Ukkonen; K Valtonen

Data mining for regulatory elements in yeast genome

Proc Int Conf Intell Syst Mol Biol. 1997:5:65-74.

Authors

A Brazma¹, J Vilo, E Ukkonen, K Valtonen

Affiliation

¹ Institute of Mathematics and Computer Science, University of Latvia. [email protected]

PMID: 9322017

Abstract

We have examined methods and developed a general software tool for finding and analyzing combinations of transcription factor binding sites that occur relatively often in gene upstream regions (putative promoter regions) in the yeast genome. Such frequently occurring combinations may be essential parts of possible promoter classes. The regions upstream to all genes were first isolated from the yeast genome database MIPS using the information in the annotation files of the database. The ones that do not overlap with coding regions were chosen for further studies. Next, all occurrences of the yeast transcription factor binding sites, as given in the IMD database, were located in the genome and in the selected regions in particular. Finally, by using a general purpose data mining software in combination with our own software, which parametrizes the search, we can find the combinations of binding sites that occur in the upstream regions more frequently than would be expected on the basis of the frequency of individual sites. The procedure also finds so-called association rules present in such combinations. The developed tool is available for use through the WWW.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Binding Sites / genetics
Chromosomes, Fungal / genetics
Databases, Factual
Genes, Regulator*
Genome, Fungal*
Open Reading Frames
Promoter Regions, Genetic
Saccharomyces cerevisiae / genetics*
Saccharomyces cerevisiae / metabolism
Software*
Transcription Factors / metabolism

Substances

Transcription Factors