PepCA: Unveiling protein-peptide interaction sites with a multi-input neural network model

iScience. 2024 Aug 30;27(10):110850. doi: 10.1016/j.isci.2024.110850. eCollection 2024 Oct 18.

Abstract

The protein-peptide interaction plays a pivotal role in fields such as drug development, yet remains underexplored experimentally and challenging to model computationally. Herein, we introduce PepCA, a sequence-based approach for predicting peptide-binding sites on proteins. A primary obstacle in predicting peptide-protein interactions is the difficulty in acquiring precise protein structures, coupled with the uncertainty of polypeptide configurations. To address this, we first encode protein sequences using the Evolutionary Scale Modeling 2 (ESM-2) pre-trained model to extract latent structural information. Additionally, we have developed a multi-input coattention mechanism to concurrently update the encoding of both peptide and protein residues. PepCA integrates this module within an encoder-decoder structure. This model's high precision in identifying binding sites significantly advances the field of computational biology, offering vital insights for peptide drug development and protein science.

Keywords: Biomolecules; Machine learning; Molecular interaction; Protein folding; Software program for structure determination.