vHULK, a New Tool for Bacteriophage Host Prediction Based on Annotated Genomic Features and Neural Networks

Phage (New Rochelle). 2022 Dec 1;3(4):204-212. doi: 10.1089/phage.2021.0016. Epub 2022 Dec 19.

Abstract

Background: The experimental determination of a bacteriophage host is a laborious procedure. Thus, there is a pressing need for reliable computational predictions of bacteriophage hosts.

Materials and methods: We developed the program vHULK for phage host prediction based on 9504 phage genome features, which consider alignment significance scores between predicted proteins and a curated database of viral protein families. The features were fed to a neural network, and two models were trained to predict 77 host genera and 118 host species.

Results: In controlled random test sets with 90% redundancy reduction in terms of protein similarity, vHULK obtained on average 83% precision and 79% recall at the genus level, and 71% precision and 67% recall at the species level. The performance of vHULK was compared against three other tools on a test data set with 2153 phage genomes. On this data set, vHULK achieved better performance at both the genus and the species levels than the other tools.

Conclusions: Our results suggest that vHULK represents an advance on the state of art in phage host prediction.

Keywords: Acinetobacter baumannii; Enterococcus faecium; Klebsiella pneumoniae; Pseudomonas aeruginosa; Staphylococcus aureus; machine learning.