This study delves into the prediction of protein-peptide interactions using advanced machine learning techniques, comparing models such as sequence-based, standard CNNs, and traditional classifiers. Leveraging pre-trained language models and multi-view window scanning CNNs, our approach yields significant improvements, with ProtTrans standing out based on 2.1 billion protein sequences and 393 billion amino acids. The integrated model demonstrates remarkable performance, achieving an AUC of 0.856 and 0.823 on the PepBCL Set_1 and Set_2 datasets, respectively. Additionally, it attains a Precision of 0.564 in PepBCL Set 1 and 0.527 in PepBCL Set 2, surpassing the performance of previous methods. Beyond this, we explore the application of this model in cancer therapy, particularly in identifying peptide interactions for selective targeting of cancer cells, and other fields. The findings of this study contribute to bioinformatics, providing valuable insights for drug discovery and therapeutic development.
Keywords: Bioinformatics; Machine learning; Multi-view window scanning CNNs; Pre-trained language models; Protein-peptide interactions.
Copyright © 2024 Elsevier Inc. All rights reserved.