Nucleosome positioning from tiling microarray data

Bioinformatics. 2008 Jul 1;24(13):i139-46. doi: 10.1093/bioinformatics/btn151.

Abstract

Motivation: The packaging of DNA around nucleosomes in eukaryotic cells plays a crucial role in regulation of gene expression, and other DNA-related processes. To better understand the regulatory role of nucleosomes, it is important to pinpoint their position in a high (5-10 bp) resolution. Toward this end, several recent works used dense tiling arrays to map nucleosomes in a high-throughput manner. These data were then parsed and hand-curated, and the positions of nucleosomes were assessed.

Results: In this manuscript, we present a fully automated algorithm to analyze such data and predict the exact location of nucleosomes. We introduce a method, based on a probabilistic graphical model, to increase the resolution of our predictions even beyond that of the microarray used. We show how to build such a model and how to compile it into a simple Hidden Markov Model, allowing for a fast and accurate inference of nucleosome positions. We applied our model to nucleosomal data from mid-log yeast cells reported by Yuan et al. and compared our predictions to those of the original paper; to a more recent method that uses five times denser tiling arrays as explained by Lee et al.; and to a curated set of literature-based nucleosome positions. Our results suggest that by applying our algorithm to the same data used by Yuan et al. our fully automated model traced 13% more nucleosomes, and increased the overall accuracy by about 20%. We believe that such an improvement opens the way for a better understanding of the regulatory mechanisms controlling gene expression, and how they are encoded in the DNA.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence
  • Base Sequence
  • Markov Chains
  • Models, Genetic*
  • Molecular Sequence Data
  • Nucleosomes / genetics*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Pattern Recognition, Automated / methods*
  • Sequence Analysis, DNA / methods*

Substances

  • Nucleosomes