A multi-source molecular network representation model for protein-protein interactions prediction

Sci Rep. 2024 Mar 14;14(1):6184. doi: 10.1038/s41598-024-56286-w.

Abstract

The prediction of potential protein-protein interactions (PPIs) is a critical step in decoding diseases and understanding cellular mechanisms. Traditional biological experiments have identified plenty of potential PPIs in recent years, but this problem is still far from being solved. Hence, there is urgent to develop computational models with good performance and high efficiency to predict potential PPIs. In this study, we propose a multi-source molecular network representation learning model (called MultiPPIs) to predict potential protein-protein interactions. Specifically, we first extract the protein sequence features according to the physicochemical properties of amino acids by utilizing the auto covariance method. Second, a multi-source association network is constructed by integrating the known associations among miRNAs, proteins, lncRNAs, drugs, and diseases. The graph representation learning method, DeepWalk, is adopted to extract the multisource association information of proteins with other biomolecules. In this way, the known protein-protein interaction pairs can be represented as a concatenation of the protein sequence and the multi-source association features of proteins. Finally, the Random Forest classifier and corresponding optimal parameters are used for training and prediction. In the results, MultiPPIs obtains an average 86.03% prediction accuracy with 82.69% sensitivity at the AUC of 93.03% under five-fold cross-validation. The experimental results indicate that MultiPPIs has a good prediction performance and provides valuable insights into the field of potential protein-protein interactions prediction. MultiPPIs is free available at https://github.com/jiboyalab/multiPPIs .

Keywords: Graph representation learning; Multi-source molecular network; Protein–protein interactions; Random forest.

MeSH terms

  • Amino Acid Sequence
  • Amino Acids
  • Computational Biology / methods
  • MicroRNAs*
  • Proteins / metabolism
  • RNA, Long Noncoding*

Substances

  • Proteins
  • MicroRNAs
  • Amino Acids
  • RNA, Long Noncoding