The enormous diversity of bacteriophages and their bacterial hosts presents a significant challenge to predict which phages infect a focal set of bacteria. Infection is largely determined by complementary-and largely uncharacterized-genetics of adsorption, injection, cell take-over, and lysis. Here we present a machine learning approach to predict phage-bacteria interactions trained on genome sequences of and phenotypic interactions among 51 Escherichia coli strains and 45 phage λ strains that coevolved in laboratory conditions for 37 days. Leveraging multiple inference strategies and without a priori knowledge of driver mutations, this framework predicts both who infects whom and the quantitative levels of infections across a suite of 2,295 potential interactions. We found that the most effective approach inferred interaction phenotypes from independent contributions from phage and bacteria mutations, accurately predicting 86% of interactions while reducing the relative error in the estimated strength of the infection phenotype by 40%. Feature selection revealed key phage λ and Escherchia coli mutations that have a significant influence on the outcome of phage-bacteria interactions, corroborating sites previously known to affect phage λ infections, as well as identifying mutations in genes of unknown function not previously shown to influence bacterial resistance. The method's success in recapitulating strain-level infection outcomes arising during coevolutionary dynamics may also help inform generalized approaches for imputing genetic drivers of interaction phenotypes in complex communities of phage and bacteria.
Keywords: coevolution; driver; genotype; machine learning; mutation; phage; phenotype.
© The Author(s) 2024. Published by Oxford University Press.