A novel framework for phage-host prediction via logical probability theory and network sparsification

Brief Bioinform. 2024 Nov 22;26(1):bbae708. doi: 10.1093/bib/bbae708.

Abstract

Bacterial resistance has emerged as one of the greatest threats to human health, and phages have shown tremendous potential in addressing the issue of drug-resistant bacteria by lysing host. The identification of phage-host interactions (PHI) is crucial for addressing bacterial infections. Some existing computational methods for predicting PHI are suboptimal in terms of prediction efficiency due to the limited types of available information. Despite the emergence of some supporting information, the generalizability of models using this information is limited by the small scale of the databases. Additionally, most existing models overlook the sparsity of association data, which severely impacts their predictive performance as well. In this study, we propose a dual-view sparse network model (DSPHI) to predict PHI, which leverages logical probability theory and network sparsification. Specifically, we first constructed similarity networks using the sequences of phages and hosts respectively, and then sparsified these networks, enabling the model to focus more on key information during the learning process, thereby improving prediction efficiency. Next, we utilize logical probability theory to compute high-order logical information between phages (hosts), which is known as mutual information. Subsequently, we connect this information in node form to the sparse phage (host) similarity network, resulting in a phage (host) heterogeneous network that better integrates the two information views, thereby reducing the complexity of model computation and enhancing information aggregation capabilities. The hidden features of phages and hosts are explored through graph learning algorithms. Experimental results demonstrate that mutual information is effective information in predicting PHI, and the sparsification procedure of similarity networks significantly improves the model's predictive performance.

Keywords: graph convolutional network; logical probability theory; metagenomic data; network sparsification; phage–host interactions.

MeSH terms

  • Algorithms
  • Bacteria / genetics
  • Bacteria / virology
  • Bacteriophages* / genetics
  • Computational Biology / methods
  • Host-Pathogen Interactions
  • Probability Theory