Predicting noncoding RNA and disease associations using multigraph contrastive learning

Sci Rep. 2025 Jan 2;15(1):230. doi: 10.1038/s41598-024-81862-5.

Abstract

MiRNAs and lncRNAs are two essential noncoding RNAs. Predicting associations between noncoding RNAs and diseases can significantly improve the accuracy of early diagnosis.With the continuous breakthroughs in artificial intelligence, researchers increasingly use deep learning methods to predict associations. Nevertheless, most existing methods face two major issues: low prediction accuracy and the limitation of only being able to predict a single type of noncoding RNA-disease association. To address these challenges, this paper proposes a method called K-Means and multigraph Contrastive Learning for predicting associations among miRNAs, lncRNAs, and diseases (K-MGCMLD). The K-MGCMLD model is divided into four main steps. The first step is the construction of a heterogeneous graph. The second step involves down sampling using the K-means clustering algorithm to balance the positive and negative samples. The third step is to use an encoder with a Graph Convolutional Network (GCN) architecture to extract embedding vectors. Multigraph contrastive learning, including both local and global graph contrastive learning, is used to help the embedding vectors better capture the latent topological features of the graph. The fourth step involves feature reconstruction using the balanced positive and negative samples and the embedding vectors fed into an XGBoost classifier for multi-association classification prediction. Experimental results have shown that AUC value for miRNA-disease association is 0.9542, lncRNA-disease association is 0.9603, and lncRNA-miRNA association is 0.9687. Additionally, this study has conducted case analyses using K-MGCMLD, which has validated the associations of all the top 30 miRNAs predicted to be associated with lung cancer and Alzheimer's diseases.

Keywords: Diseases; Graph contrastive learning; Heterogeneous graph; MiRNAs; Multi-association prediction; lncRNAs.

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • Deep Learning
  • Genetic Predisposition to Disease
  • Humans
  • MicroRNAs* / genetics
  • RNA, Long Noncoding* / genetics
  • RNA, Untranslated / genetics

Substances

  • RNA, Long Noncoding
  • MicroRNAs
  • RNA, Untranslated