A generalizable Hi-C foundation model for chromatin architecture, single-cell and multi-omics analysis across species

bioRxiv [Preprint]. 2024 Dec 20:2024.12.16.628821. doi: 10.1101/2024.12.16.628821.

Abstract

Nuclear DNA is organized into a compact three-dimensional (3D) structure that impacts critical cellular processes. High-throughput chromosome conformation capture (Hi-C) is the most widely used method for measuring 3D genome architecture, while linear epigenomic assays, such as ATAC-seq, DNase-seq, and ChIP-seq, are extensively employed to characterize epigenomic regulation. However, the integrative analysis of chromatin interactions and associated epigenomic regulation remains challenging due to the pairwise nature of Hi-C data, mismatched resolution between Hi-C and epigenomic assays, and inconsistencies among analysis tools. Here we propose HiCFoundation, a Hi-C-based foundation model for integrative analysis linking chromatin structure to downstream regulatory function. HiCFoundation is trained from hundreds of Hi-C assays encompassing 118 million contact matrix submatrices. The model achieves state-of-the-art performance on multiple types of 3D genome analysis, including reproducibility analysis, resolution enhancement, and loop detection. We further demonstrate the model's generalizability through genome architecture analysis of 316 species. Notably, by enhancing low-coverage experimental Hi-C data, HiCFoundation reveals genome-wide loop loss during differentiation of hematopoietic stem and progenitor cells (HSPCs) to neutrophils. Additionally, HiCFoundation is able to predict multiple types of epigenomic activity from Hi-C input and further interprets the link between Hi-C input and epigenomic output to reveal the relationship between chromatin conformation and genome function. Finally, HiCFoundation can analyze single-cell Hi-C data, shedding light on genome structure at single-cell resolution. HiCFoundation thus provides a unified, efficient, generalizable, and interpretable foundation for genome architecture, single-cell and multi-omics analysis across species, paving the path for systematically studying genome 3D architecture and its regulatory mechanisms.

Publication types

  • Preprint