Efficiently finding regulatory elements using correlation with gene expression

J Bioinform Comput Biol. 2004 Jun;2(2):273-88. doi: 10.1142/s0219720004000612.

Abstract

We present an efficient algorithm for detecting putative regulatory elements in the upstream DNA sequences of genes, using gene expression information obtained from microarray experiments. Based on a generalized suffix tree, our algorithm looks for motif patterns whose appearance in the upstream region is most correlated with the expression levels of the genes. We are able to find the optimal pattern, in time linear in the total length of the upstream sequences. We implement and apply our algorithm to publicly available microarray gene expression data, and show that our method is able to discover biologically significant motifs, including various motifs which have been reported previously using the same data set. We further discuss applications for which the efficiency of the method is essential, as well as possible extensions to our algorithm.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Motifs / genetics*
  • Base Sequence
  • Gene Expression Profiling / methods*
  • Genes, Regulator / genetics*
  • Molecular Sequence Data
  • Oligonucleotide Array Sequence Analysis / methods*
  • Pattern Recognition, Automated
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • Sequence Homology, Nucleic Acid
  • Statistics as Topic