Background: The prediction of protein-protein interaction sites plays a crucial role in biochemical processes. Investigating the interaction between viruses and receptor proteins through biological techniques aids in understanding disease mechanisms and guides the development of corresponding drugs. While various methods have been proposed in the past, they often suffer from drawbacks such as long processing times, high costs, and low accuracy.
Results: Addressing these challenges, we propose a novel protein-protein interaction site prediction network based on multi-information fusion. In our approach, the initial amino acid features are depicted by the position-specific scoring matrix, hidden Markov model, dictionary of protein secondary structure, and one-hot encoding. Simultaneously, we adopt a multi-channel approach to extract deep-level amino acids features from different perspectives. The graph convolutional network channel effectively extracts spatial structural information. The bidirectional long short-term memory channel treats the amino acid sequence as natural language, capturing the protein's primary structure information. The ProtT5 protein large language model channel outputs a more comprehensive amino acid embedding representation, providing a robust complement to the two aforementioned channels. Finally, the obtained amino acid features are fed into the prediction layer for the final prediction.
Conclusion: Compared with six protein structure-based methods and six protein sequence-based methods, our model achieves optimal performance across evaluation metrics, including accuracy, precision, F1, Matthews correlation coefficient, and area under the precision recall curve, which demonstrates the superiority of our model.
Keywords: Bi-LSTM; GCN; ProtT5; Protein–protein interaction (PPI) sites.
© 2024. The Author(s).