MCNN-AAPT: accurate classification and functional prediction of amino acid and peptide transporters in secondary active transporters using protein language models and multi-window deep learning

J Biomol Struct Dyn. 2024 Nov 22:1-10. doi: 10.1080/07391102.2024.2431664. Online ahead of print.

Abstract

Secondary active transporters play a crucial role in cellular physiology by facilitating the movement of molecules across cell membranes. Identifying the functional classes of these transporters, particularly amino acid and peptide transporters, is essential for understanding their involvement in various physiological processes and disease pathways, including cancer. This study aims to develop a robust computational framework that integrates pre-trained protein language models and deep learning techniques to classify amino acid and peptide transporters within the secondary active transporter (SAT) family and predict their functional association with solute carrier (SLC) proteins. The study leverages a comprehensive dataset of 448 secondary active transporters, including 36 solute carrier proteins, obtained from UniProt and the Transporter Classification Database (TCDB). Three state-of-the-art protein language models, ProtTrans, ESM-1b, and ESM-2, are evaluated within a deep learning neural network architecture that employs a multi-window scanning technique to capture local and global sequence patterns. The ProtTrans-based feature set demonstrates exceptional performance, achieving a classification accuracy of 98.21% with 87.32% sensitivity and 99.76% specificity for distinguishing amino acid and peptide transporters from other SATs. Furthermore, the model maintains strong predictive ability for SLC proteins, with an overall accuracy of 88.89% and a Matthews Correlation Coefficient (MCC) of 0.7750. This study showcases the power of integrating pre-trained protein language models and deep learning techniques for the functional classification of secondary active transporters and the prediction of associated solute carrier proteins. The findings have significant implications for drug development, disease research, and the broader understanding of cellular transport mechanisms.

Keywords: Functional classification; drug targeting; protein language models (PLMs); secondary active transporters; solute carrier proteins (SLC).

Plain language summary

This study presents a novel computational approach that integrates pre-trained protein language models, such as ProtTrans, ESM-1b, and ESM-2, with deep learning techniques to achieve two key objectives: 1. accurately classify amino acid and peptide transporters within the secondary active transporter (SAT) family, and 2. Predict the functional association of SATs with solute carrier (SLC) proteins.We developed a comprehensive dataset of 448 secondary active transporters, including 36 SLC proteins, and evaluated the performance of the different language models within a deep-learning neural network framework. The results demonstrate the exceptional performance of the ProtTrans-based feature set, which achieves 98.21% accuracy, 87.32% sensitivity, and 99.76% specificity in distinguishing amino acid and peptide transporters from other SATs, while also maintaining strong predictive ability for SLC proteins.