Deoxyribonucleic acid (DNA) methylation (DNAm) is an important epigenetic mechanism that plays a role in chromatin structure and transcriptional regulation. Elucidating the relationship between DNAm and gene expression is of great importance for understanding its role in transcriptional regulation. The conventional approach is to construct machine-learning-based methods to predict gene expression based on mean methylation signals in promoter regions. However, this type of strategy only explains about 25% of gene expression variation, and hence is inadequate in elucidating the relationship between DNAm and transcriptional activity. In addition, using mean methylation as input features neglects the heterogeneity of cell populations that can be reflected by DNAm haplotypes. We here developed TRAmaHap, a novel deep-learning framework that predicts gene expression by utilizing the characteristics of DNAm haplotypes in proximal promoters and distal enhancers. Using benchmark data of human and mouse normal tissues, TRAmHap shows much higher accuracy than existing machine-learning based methods, by explaining 60~80% of gene expression variation across tissue types and disease conditions. Our model demonstrated that gene expression can be accurately predicted by DNAm patterns in promoters and long-range enhancers as far as 25 kb away from transcription start site, especially in the presence of intra-gene chromatin interactions.
Keywords: DNA methylation haplotypes; deep learning; enhancer; gene expression.
© The Author(s) 2023. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected].