Subcellular localization for Gram positive and Gram negative bacterial proteins using linear interpolation smoothing model

Harsh Saini; Gaurav Raicar; Abdollah Dehzangi; Sunil Lal; Alok Sharma

doi:10.1016/j.jtbi.2015.08.020

Subcellular localization for Gram positive and Gram negative bacterial proteins using linear interpolation smoothing model

J Theor Biol. 2015 Dec 7:386:25-33. doi: 10.1016/j.jtbi.2015.08.020. Epub 2015 Sep 18.

Authors

Harsh Saini¹, Gaurav Raicar², Abdollah Dehzangi³, Sunil Lal⁴, Alok Sharma⁵

Affiliations

¹ University of the South Pacific, Fiji. Electronic address: [email protected].
² University of the South Pacific, Fiji. Electronic address: [email protected].
³ Griffith University, Australia. Electronic address: [email protected].
⁴ University of the South Pacific, Fiji. Electronic address: [email protected].
⁵ University of the South Pacific, Fiji; Griffith University, Australia. Electronic address: [email protected].

PMID: 26386142
DOI: 10.1016/j.jtbi.2015.08.020

Abstract

Protein subcellular localization is an important topic in proteomics since it is related to a protein׳s overall function, helps in the understanding of metabolic pathways, and in drug design and discovery. In this paper, a basic approximation technique from natural language processing called the linear interpolation smoothing model is applied for predicting protein subcellular localizations. The proposed approach extracts features from syntactical information in protein sequences to build probabilistic profiles using dependency models, which are used in linear interpolation to determine how likely is a sequence to belong to a particular subcellular location. This technique builds a statistical model based on maximum likelihood. It is able to deal effectively with high dimensionality that hinders other traditional classifiers such as Support Vector Machines or k-Nearest Neighbours without sacrificing performance. This approach has been evaluated by predicting subcellular localizations of Gram positive and Gram negative bacterial proteins.

Keywords: Dependency models; Feature extraction; Hidden Markov models; Natural language processing.

MeSH terms

Algorithms
Bacterial Proteins / metabolism*
Gram-Negative Bacteria / metabolism*
Gram-Positive Bacteria / metabolism*
Markov Chains
Models, Statistical
Natural Language Processing
Proteomics / methods*
Sensitivity and Specificity
Subcellular Fractions / metabolism

Substances

Bacterial Proteins