PyMEGABASE: Predicting Cell-Type-Specific Structural Annotations of Chromosomes Using the Epigenome

J Mol Biol. 2023 Aug 1;435(15):168180. doi: 10.1016/j.jmb.2023.168180. Epub 2023 Jun 9.

Abstract

The folding patterns of interphase genomes in higher eukaryotes, as obtained from DNA-proximity-ligation or Hi-C experiments, are used to classify loci into structural classes called compartments and subcompartments. These structurally annotated (sub) compartments are known to exhibit specific epigenomic characteristics and cell-type-specific variations. To explore the relationship between genome structure and the epigenome, we present PyMEGABASE (PYMB), a maximum-entropy-based neural network model that predicts (sub) compartment annotations of a locus based solely on the local epigenome, such as ChIP-Seq of histone post-translational modifications. PYMB builds upon our previous model while improving robustness, capability to handle diverse inputs and user-friendly implementation. We employed PYMB to predict subcompartments for over a hundred human cell types available in ENCODE, shedding light on the links between subcompartments, cell identity, and epigenomic signals. The fact that PYMB, trained on data for human cells, can accurately predict compartments in mice suggests that the model is learning underlying physicochemical principles transferable across cell types and species. Reliable at higher resolutions (up to 5 kbp), PYMB is used to investigate compartment-specific gene expression. Not only can PYMB generate (sub) compartment information without Hi-C experiments, but its predictions are also interpretable. Analyzing PYMB's trained parameters, we explore the importance of various epigenomic marks in each subcompartment prediction. Furthermore, the predictions of the model can be used as input for OpenMiChroM software, which has been calibrated to generate three-dimensional structures of the genome. Detailed documentation of PYMB is available at https://pymegabase.readthedocs.io, including an installation guide using pip or conda, and Jupyter/Colab notebook tutorials.

Keywords: Epigenetics; Genome organization; chromatin subcompartments; maximum entropy.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Chromatin
  • Chromosomes* / metabolism
  • Databases, Genetic*
  • Epigenome* / genetics
  • Histones / metabolism
  • Humans
  • Mice
  • Neural Networks, Computer
  • Software

Substances

  • Chromatin
  • Histones