Bacteriophage virion proteins (BVPs) are bacterial viruses that have a great impact on different biological functions of bacteria. They are significantly used in genetic engineering and phage therapy applications. Correct identification of BVP through conventional pathogen methods are slow and expensive. Thus, designing a Bioinformatics predictor is urgently desirable to accelerate correct identification of BVPs within a huge volume of proteins. However, available prediction tools performance is inadequate due to the lack of useful feature representation and severe imbalance issue. In the present study, we propose an intelligent model, called Pred-BVP-Unb for discrimination of BVPs that employed three nominal sequences-driven descriptors, i.e. Bi-PSSM evolutionary information, composition & translation, and split amino acid composition. The imbalance phenomena between classes were coped with the help of a synthetic minority oversampling technique. The essential attributes are selected by a robust algorithm called recursive feature elimination. Finally, the optimal feature space is provided to support vector machine classifier using a radial base kernel in order to train the model. Our predictor remarkably outperforms than existing approaches in the literature by achieving the highest accuracy of 92.54% and 83.06% respectively on the benchmark and independent datasets. We expect that Pred-BVP-Unb tool can provide useful hints for designing antibacterial drugs and also helpful to expedite large scale discovery of new bacteriophage virion proteins. The source code and all datasets are publicly available at https://github.com/Muhammad-Arif-NUST/BVP_Pred_Unb.
Keywords: Bacteriophage virion proteins; Bi-profile evolutionary information; Recursive feature elimination; Split amino acid composition; Support vector machine.
Copyright © 2019 Elsevier Inc. All rights reserved.