A deep learning framework for predicting disease-gene associations with functional modules and graph augmentation

BMC Bioinformatics. 2024 Jun 14;25(1):214. doi: 10.1186/s12859-024-05841-3.

Abstract

Background: The exploration of gene-disease associations is crucial for understanding the mechanisms underlying disease onset and progression, with significant implications for prevention and treatment strategies. Advances in high-throughput biotechnology have generated a wealth of data linking diseases to specific genes. While graph representation learning has recently introduced groundbreaking approaches for predicting novel associations, existing studies always overlooked the cumulative impact of functional modules such as protein complexes and the incompletion of some important data such as protein interactions, which limits the detection performance.

Results: Addressing these limitations, here we introduce a deep learning framework called ModulePred for predicting disease-gene associations. ModulePred performs graph augmentation on the protein interaction network using L3 link prediction algorithms. It builds a heterogeneous module network by integrating disease-gene associations, protein complexes and augmented protein interactions, and develops a novel graph embedding for the heterogeneous module network. Subsequently, a graph neural network is constructed to learn node representations by collectively aggregating information from topological structure, and gene prioritization is carried out by the disease and gene embeddings obtained from the graph neural network. Experimental results underscore the superiority of ModulePred, showcasing the effectiveness of incorporating functional modules and graph augmentation in predicting disease-gene associations. This research introduces innovative ideas and directions, enhancing the understanding and prediction of gene-disease relationships.

Keywords: Deep learning; Gene-disease associations; Graph augmentation; Graph neural networks; Protein complexes.

MeSH terms

  • Algorithms*
  • Computational Biology / methods
  • Deep Learning*
  • Genetic Association Studies / methods
  • Genetic Predisposition to Disease / genetics
  • Humans
  • Neural Networks, Computer
  • Protein Interaction Maps / genetics