Host defense peptides are promising candidates for the development of novel antibiotics. To realize their therapeutic potential, high levels of target selectivity is essential. This study aims to identify factors governing selectivity via the use of the random forest algorithm for correlating peptide sequence information with their bioactivity data. Satisfactory predictive models were achieved from out-of-bag prediction that yielded accuracies and Matthew's correlation coefficients in excess of 0.80 and 0.57, respectively. Model interpretation through the use of variable importance metrics and partial dependence plots indicated that the selectivity was heavily influenced by the composition and distribution patterns of molecular charge and solubility related parameters. Furthermore, the three investigated bacterial target species (Escherichia coli, Pseudomonas aeruginosa and Staphylococcus aureus) likely had a significant influence on how selectivity was realized as there appears to be a similar underlying selectivity mechanism on the basis of charge-solubility properties (i.e. but which is tailored according to the target in question).
Keywords: Antimicrobial peptides; Data science; Host defense peptides; QSAR; Selectivity; Structure-activity relationship.
Copyright © 2021 Elsevier Inc. All rights reserved.