Graph embedding and unsupervised learning predict genomic sub-compartments from HiC chromatin interaction data

Nat Commun. 2020 Mar 3;11(1):1173. doi: 10.1038/s41467-020-14974-x.

Abstract

Chromatin interaction studies can reveal how the genome is organized into spatially confined sub-compartments in the nucleus. However, accurately identifying sub-compartments from chromatin interaction data remains a challenge in computational biology. Here, we present Sub-Compartment Identifier (SCI), an algorithm that uses graph embedding followed by unsupervised learning to predict sub-compartments using Hi-C chromatin interaction data. We find that the network topological centrality and clustering performance of SCI sub-compartment predictions are superior to those of hidden Markov model (HMM) sub-compartment predictions. Moreover, using orthogonal Chromatin Interaction Analysis by in-situ Paired-End Tag Sequencing (ChIA-PET) data, we confirmed that SCI sub-compartment prediction outperforms HMM. We show that SCI-predicted sub-compartments have distinct epigenetic marks, transcriptional activities, and transcription factor enrichment. Moreover, we present a deep neural network to predict sub-compartments using epigenome, replication timing, and sequence data. Our neural network predicts more accurate sub-compartment predictions when SCI-determined sub-compartments are used as labels for training.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Chromatin / genetics*
  • Chromatin / metabolism
  • Cluster Analysis
  • Computer Graphics*
  • Data Analysis
  • Epigenome
  • Gene Expression
  • Genomics / methods*
  • Humans
  • K562 Cells
  • Markov Chains
  • Neural Networks, Computer*
  • Reproducibility of Results
  • Unsupervised Machine Learning

Substances

  • Chromatin