Improving comparative analyses of Hi-C data via contrastive self-supervised learning

Brief Bioinform. 2023 Jul 20;24(4):bbad193. doi: 10.1093/bib/bbad193.

Abstract

Hi-C is a widely applied chromosome conformation capture (3C)-based technique, which has produced a large number of genomic contact maps with high sequencing depths for a wide range of cell types, enabling comprehensive analyses of the relationships between biological functionalities (e.g. gene regulation and expression) and the three-dimensional genome structure. Comparative analyses play significant roles in Hi-C data studies, which are designed to make comparisons between Hi-C contact maps, thus evaluating the consistency of replicate Hi-C experiments (i.e. reproducibility measurement) and detecting statistically differential interacting regions with biological significance (i.e. differential chromatin interaction detection). However, due to the complex and hierarchical nature of Hi-C contact maps, it remains challenging to conduct systematic and reliable comparative analyses of Hi-C data. Here, we proposed sslHiC, a contrastive self-supervised representation learning framework, for precisely modeling the multi-level features of chromosome conformation and automatically producing informative feature embeddings for genomic loci and their interactions to facilitate comparative analyses of Hi-C contact maps. Comprehensive computational experiments on both simulated and real datasets demonstrated that our method consistently outperformed the state-of-the-art baseline methods in providing reliable measurements of reproducibility and detecting differential interactions with biological meanings.

Keywords: Hi-C; chromosome conformation; contrastive learning; differential chromatin interaction; graph neural network; reproducibility measurement.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromatin* / genetics
  • Chromosomes* / genetics
  • Genomics / methods
  • Reproducibility of Results
  • Supervised Machine Learning

Substances

  • Chromatin