scGraph2Vec: a deep generative model for gene embedding augmented by graph neural network and single-cell omics data

Gigascience. 2024 Jan 2:13:giae108. doi: 10.1093/gigascience/giae108.

Abstract

Background: Exploring the cellular processes of genes from the aspects of biological networks is of great interest to understanding the properties of complex diseases and biological systems. Biological networks, such as protein-protein interaction networks and gene regulatory networks, provide insights into the molecular basis of cellular processes and often form functional clusters in different tissue and disease contexts.

Results: We present scGraph2Vec, a deep learning framework for generating informative gene embeddings. scGraph2Vec extends the variational graph autoencoder framework and integrates single-cell datasets and gene-gene interaction networks. We demonstrate that the gene embeddings are biologically interpretable and enable the identification of gene clusters representing functional or tissue-specific cellular processes. By comparing similar tools, we showed that scGraph2Vec clearly distinguished different gene clusters and aggregated more biologically functional genes. scGraph2Vec can be widely applied in diverse biological contexts. We illustrated that the embeddings generated by scGraph2Vec can infer disease-associated genes from genome-wide association study data (e.g., COVID-19 and Alzheimer's disease), identify additional driver genes in lung adenocarcinoma, and reveal regulatory genes responsible for maintaining or transitioning melanoma cell states.

Conclusions: scGraph2Vec not only reconstructs tissue-specific gene networks but also obtains a latent representation of genes implying their biological functions.

Keywords: complex disease; gene embedding; gene regulatory network; single-cell RNA-seq; tissue specificity.

MeSH terms

  • Algorithms
  • Alzheimer Disease / genetics
  • COVID-19 / genetics
  • COVID-19 / virology
  • Computational Biology / methods
  • Deep Learning*
  • Gene Regulatory Networks*
  • Genome-Wide Association Study / methods
  • Humans
  • Lung Neoplasms / genetics
  • Neural Networks, Computer
  • Protein Interaction Maps / genetics
  • Single-Cell Analysis* / methods