hiHMM: Bayesian non-parametric joint inference of chromatin state maps

Bioinformatics. 2015 Jul 1;31(13):2066-74. doi: 10.1093/bioinformatics/btv117. Epub 2015 Feb 27.

Abstract

Motivation: Genome-wide mapping of chromatin states is essential for defining regulatory elements and inferring their activities in eukaryotic genomes. A number of hidden Markov model (HMM)-based methods have been developed to infer chromatin state maps from genome-wide histone modification data for an individual genome. To perform a principled comparison of evolutionarily distant epigenomes, we must consider species-specific biases such as differences in genome size, strength of signal enrichment and co-occurrence patterns of histone modifications.

Results: Here, we present a new Bayesian non-parametric method called hierarchically linked infinite HMM (hiHMM) to jointly infer chromatin state maps in multiple genomes (different species, cell types and developmental stages) using genome-wide histone modification data. This flexible framework provides a new way to learn a consistent definition of chromatin states across multiple genomes, thus facilitating a direct comparison among them. We demonstrate the utility of this method using synthetic data as well as multiple modENCODE ChIP-seq datasets.

Conclusion: The hierarchical and Bayesian non-parametric formulation in our approach is an important extension to the current set of methodologies for comparative chromatin landscape analysis.

Availability and implementation: Source codes are available at https://github.com/kasohn/hiHMM. Chromatin data are available at http://encode-x.med.harvard.edu/data_sets/chromatin/.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Bayes Theorem*
  • Chromatin / genetics*
  • Chromatin Immunoprecipitation
  • Computational Biology / methods
  • Drosophila melanogaster / genetics
  • Gene Expression Regulation, Developmental
  • Histones / metabolism
  • Humans
  • Promoter Regions, Genetic
  • Regulatory Sequences, Nucleic Acid / genetics*
  • Software*
  • Statistics, Nonparametric*

Substances

  • Chromatin
  • Histones