Identification of DNA regulatory motifs using Bayesian variable selection

Bioinformatics. 2004 Nov 1;20(16):2553-61. doi: 10.1093/bioinformatics/bth282. Epub 2004 Apr 29.

Abstract

Motivation: Understanding the mechanisms that determine gene expression regulation is an important and challenging problem. A common approach consists of identifying DNA-binding sites from a collection of co-regulated genes and their nearby non-coding DNA sequences. Here, we consider a regression model that linearly relates gene expression levels to a sequence matching score of nucleotide patterns. We use Bayesian models and stochastic search techniques to select transcription factor binding site candidates, as an alternative to stepwise regression procedures used by other investigators.

Results: We demonstrate through simulated data the improved performance of the Bayesian variable selection method compared to the stepwise procedure. We then analyze and discuss the results from experiments involving well-studied pathways of Saccharomyces cerevisiae and Schizosaccharomyces pombe. We identify regulatory motifs known to be related to the experimental conditions considered. Some of our selected motifs are also in agreement with recent findings by other researchers. In addition, our results include novel motifs that constitute promising sets for further assessment.

Availability: The Matlab code for implementing the Bayesian variable selection method may be obtained from the corresponding author.

Publication types

  • Evaluation Study
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms*
  • Amino Acid Motifs / genetics
  • Bayes Theorem
  • Chromosome Mapping / methods
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Fungal / genetics
  • Genes, Regulator / genetics*
  • Genetic Variation
  • Models, Genetic*
  • Models, Statistical
  • Oligonucleotide Array Sequence Analysis / methods*
  • Saccharomyces cerevisiae / genetics
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • Software