VesiMCNN: Using pre-trained protein language models and multiple window scanning convolutional neural networks to identify vesicular transport proteins

Int J Biol Macromol. 2024 Sep 26;280(Pt 3):136048. doi: 10.1016/j.ijbiomac.2024.136048. Online ahead of print.

Abstract

Vesicular transport is a critical cellular process responsible for the proper organization and functioning of eukaryotic cells. This mechanism relies on specialized vesicles that shuttle macromolecules, such as proteins, across the cellular landscape, a process pivotal to maintaining cellular homeostasis. Disruptions in vesicular transport have been linked to various disease mechanisms, including cancer and neurodegenerative disorders. In this study, we present vesiMCNN, a novel computational approach that integrates pre-trained protein language models with a multi-window scanning convolutional neural network architecture to accurately identify vesicular transport proteins. To the best of our knowledge, this is the first study to leverage the power of pre-trained language models in combination with the multi-window scanning technique for this task. Our method achieved a Matthews Correlation Coefficient (MCC) of 0.558 and an Area Under the Receiver Operating Characteristic (AUC-ROC) of 0.933, outperforming existing state-of-the-art approaches. Additionally, we have curated a comprehensive benchmark dataset for the study of vesicular transport proteins, which can facilitate further research in this field. The remarkable performance of our model, combined with the comprehensive dataset and novel deep learning model, marks a significant advancement in the field of vesicular transport protein research.

Keywords: Multiple window scanning; Pre-trained protein language model; Vesicular transport protein.