Mapping enhancer-gene regulatory interactions from single-cell data

bioRxiv [Preprint]. 2024 Nov 24:2024.11.23.624931. doi: 10.1101/2024.11.23.624931.

Abstract

Mapping enhancers and their target genes in specific cell types is crucial for understanding gene regulation and human disease genetics. However, accurately predicting enhancer-gene regulatory interactions from single-cell datasets has been challenging. Here, we introduce a new family of classification models, scE2G, to predict enhancer-gene regulation. These models use features from single-cell ATAC-seq or multiomic RNA and ATAC-seq data and are trained on a CRISPR perturbation dataset including >10,000 evaluated element-gene pairs. We benchmark scE2G models against CRISPR perturbations, fine-mapped eQTLs, and GWAS variant-gene associations and demonstrate state-of-the-art performance at prediction tasks across multiple cell types and categories of perturbations. We apply scE2G to build maps of enhancer-gene regulatory interactions in heterogeneous tissues and interpret noncoding variants associated with complex traits, nominating regulatory interactions linking INPP4B and IL15 to lymphocyte counts. The scE2G models will enable accurate mapping of enhancer-gene regulatory interactions across thousands of diverse human cell types.

Publication types

  • Preprint