DeepPL: A deep-learning-based tool for the prediction of bacteriophage lifecycle

PLoS Comput Biol. 2024 Oct 17;20(10):e1012525. doi: 10.1371/journal.pcbi.1012525. eCollection 2024 Oct.

Abstract

Bacteriophages (phages) are viruses that infect bacteria and can be classified into two different lifecycles. Virulent phages (or lytic phages) have a lytic cycle that can lyse the bacteria host after their infection. Temperate phages (or lysogenic phages) can integrate their phage genomes into bacterial chromosomes and replicate with bacterial hosts via the lysogenic cycle. Identifying phage lifecycles is a crucial step in developing suitable applications for phages. Compared to the complicated traditional biological experiments, several tools have been designed for predicting phage lifecycle using different algorithms, such as random forest (RF), linear support-vector classifier (SVC), and convolutional neural network (CNN). In this study, we developed a natural language processing (NLP)-based tool-DeepPL-for predicting phage lifecycles via nucleotide sequences. The test results showed that our DeepPL had an accuracy of 94.65% with a sensitivity of 92.24% and a specificity of 95.91%. Moreover, DeepPL had 100% accuracy in lifecycle prediction on the phages we isolated and biologically verified previously in the lab. Additionally, a mock phage community metagenomic dataset was used to test the potential usage of DeepPL in viral metagenomic research. DeepPL displayed a 100% accuracy for individual phage complete genomes and high accuracies ranging from 71.14% to 100% on phage contigs produced by various next-generation sequencing technologies. Overall, our study indicates that DeepPL has a reliable performance on phage lifecycle prediction using the most fundamental nucleotide sequences and can be applied to future phage and metagenomic research.

MeSH terms

  • Algorithms
  • Bacteria / genetics
  • Bacteria / virology
  • Bacteriophages* / genetics
  • Bacteriophages* / physiology
  • Computational Biology* / methods
  • Deep Learning*
  • Genome, Viral / genetics
  • Natural Language Processing
  • Neural Networks, Computer

Grants and funding

This work was supported by the U.S. Department of Agriculture-Agricultural Research Service Current Research Information System projects (2030-42000-055-000-D to YZ, YL, and VCHW). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.