Background: The experimental determination of a bacteriophage host is a laborious procedure. Thus, there is a pressing need for reliable computational predictions of bacteriophage hosts.
Materials and methods: We developed the program vHULK for phage host prediction based on 9504 phage genome features, which consider alignment significance scores between predicted proteins and a curated database of viral protein families. The features were fed to a neural network, and two models were trained to predict 77 host genera and 118 host species.
Results: In controlled random test sets with 90% redundancy reduction in terms of protein similarity, vHULK obtained on average 83% precision and 79% recall at the genus level, and 71% precision and 67% recall at the species level. The performance of vHULK was compared against three other tools on a test data set with 2153 phage genomes. On this data set, vHULK achieved better performance at both the genus and the species levels than the other tools.
Conclusions: Our results suggest that vHULK represents an advance on the state of art in phage host prediction.
Keywords: Acinetobacter baumannii; Enterococcus faecium; Klebsiella pneumoniae; Pseudomonas aeruginosa; Staphylococcus aureus; machine learning.
Copyright 2022, Mary Ann Liebert, Inc., publishers.