A Bayesian approach to joint modeling of protein-DNA binding, gene expression and sequence data

Stat Med. 2010 Feb 20;29(4):489-503. doi: 10.1002/sim.3815.

Abstract

The genome-wide DNA-protein-binding data, DNA sequence data and gene expression data represent complementary means to deciphering global and local transcriptional regulatory circuits. Combining these different types of data can not only improve the statistical power, but also provide a more comprehensive picture of gene regulation. In this paper, we propose a novel statistical model to augment protein-DNA-binding data with gene expression and DNA sequence data when available. We specify a hierarchical Bayes model and use Markov chain Monte Carlo simulations to draw inferences. Both simulation studies and an analysis of an experimental data set show that the proposed joint modeling method can significantly improve the specificity and sensitivity of identifying target genes as compared with conventional approaches relying on a single data source.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Computer Simulation
  • DNA / genetics*
  • DNA / metabolism
  • Escherichia coli / genetics*
  • Escherichia coli / metabolism
  • Gene Expression
  • Gene Expression Regulation, Bacterial*
  • Leucine-Responsive Regulatory Protein / genetics
  • Leucine-Responsive Regulatory Protein / metabolism*
  • Markov Chains
  • Models, Statistical
  • Monte Carlo Method
  • Protein Binding*
  • Regulon
  • Sequence Analysis, DNA / statistics & numerical data*

Substances

  • Leucine-Responsive Regulatory Protein
  • DNA