Globular proteins (GPs) play vital roles in a wide range of biological processes, encompassing enzymatic catalysis and immune responses. Enzymes, among these globular proteins, facilitate biochemical reactions, while others, such as haemoglobin, contribute to essential physiological functions such as oxygen transport. Given the importance of these considerations, accurately identifying Globular proteins is essential. To address the need for precise GP identification, this research introduces an innovative approach that employs a hybrid-based deep learning model called Deep-GP. We generated two datasets based on primary sequences and developed a novel feature descriptor called, Consensus Sequence-based Trisection-Position Specific Scoring Matrix (CST-PSSM). The model training phase involved the application of deep learning techniques, including the bidirectional long short-term memory network (BiLSTM), gated recurrent unit (GRU), and convolutional neural network (CNN). The BiLSTM and CNN were hybridised for ensemble learning. The CST-PSSM-based ensemble model achieved the most accurate predictive outcomes, outperforming other competitive predictors across both training and testing datasets. This demonstrates the potential of harnessing deep learning for precise GB prediction as a robust tool to expedite research, streamline drug discovery, and unveil novel therapeutic targets.
Keywords: bioinformatics; biological techniques.
© 2024 The Author(s). IET Systems Biology published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.