Transcriptome-wide association studies (TWAS) help identify disease causing genes, but often fail to pinpoint disease mechanisms at the cellular level because of the limited sample sizes and sparsity of cell-type-specific expression data. Here we propose scPrediXcan which integrates state-of-the-art deep learning approaches that predict epigenetic features from DNA sequences with the canonical TWAS framework. Our prediction approach, ctPred, predicts cell-type-specific expression with high accuracy and captures complex gene regulatory grammar that linear models overlook. Applied to type 2 diabetes and systemic lupus erythematosus, scPrediXcan outperformed the canonical TWAS framework by identifying more candidate causal genes, explaining more genome-wide association studies (GWAS) loci, and providing insights into the cellular specificity of TWAS hits. Overall, our results demonstrate that scPrediXcan represents a significant advance, promising to deepen our understanding of the cellular mechanisms underlying complex diseases.
Keywords: Deep learning; GWAS; Single-cell; Systemic lupus erythematosus; TWAS; Type 2 diabetes.