EpiAlign: an alignment-based bioinformatic tool for comparing chromatin state sequences

Nucleic Acids Res. 2019 Jul 26;47(13):e77. doi: 10.1093/nar/gkz287.

Abstract

The availability of genome-wide epigenomic datasets enables in-depth studies of epigenetic modifications and their relationships with chromatin structures and gene expression. Various alignment tools have been developed to align nucleotide or protein sequences in order to identify structurally similar regions. However, there are currently no alignment methods specifically designed for comparing multi-track epigenomic signals and detecting common patterns that may explain functional or evolutionary similarities. We propose a new local alignment algorithm, EpiAlign, designed to compare chromatin state sequences learned from multi-track epigenomic signals and to identify locally aligned chromatin regions. EpiAlign is a dynamic programming algorithm that novelly incorporates varying lengths and frequencies of chromatin states. We demonstrate the efficacy of EpiAlign through extensive simulations and studies on the real data from the NIH Roadmap Epigenomics project. EpiAlign is able to extract recurrent chromatin state patterns along a single epigenome, and many of these patterns carry cell-type-specific characteristics. EpiAlign can also detect common chromatin state patterns across multiple epigenomes, and it will serve as a useful tool to group and distinguish epigenomic samples based on genome-wide or local chromatin state patterns.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Base Sequence
  • Brain Chemistry
  • Chromatin / genetics
  • Chromatin / ultrastructure*
  • Computational Biology / methods*
  • DNA Methylation
  • Databases, Genetic
  • Datasets as Topic
  • Epigenomics / methods*
  • Gene Ontology
  • Humans
  • Nerve Tissue Proteins / biosynthesis
  • Nerve Tissue Proteins / chemistry
  • Nerve Tissue Proteins / genetics
  • Sequence Alignment*
  • Software

Substances

  • Chromatin
  • Nerve Tissue Proteins