Subcellular localization is a key functional characteristic of proteins. It is determined by signals encoded in the protein sequence. The experimental determination of subcellular localization is laborious. Thus, a number of computational methods have been developed to predict the protein location from sequence. However predictions made by different methods often disagree with each other and it is not always clear which algorithm performs best for the given cellular compartment. We benchmarked primary subcellular localization predictors for proteins from Gram-negative bacteria, PSORTb3, PSLpred, CELLO, and SOSUI-GramN, on a common dataset that included 1056 proteins. We found that PSORTb3 performs best on the average, but is outperformed by other methods in predictions of extracellular proteins. This motivated us to develop a meta-predictor, which combines the primary methods by using the logistic regression models, to take advantage of their combined strengths, and to eliminate their individual weaknesses. MetaLocGramN runs the primary methods, and based on their output classifies protein sequences into one of five major localizations of the Gram-negative bacterial cell: cytoplasm, plasma membrane, periplasm, outer membrane, and extracellular space. MetaLocGramN achieves the average Matthews correlation coefficient of 0.806, i.e. 12% better than the best individual primary method. MetaLocGramN is a meta-predictor specialized in predicting subcellular localization for proteins from Gram-negative bacteria. According to our benchmark, it performs better than all other tools run independently. MetaLocGramN is a web and SOAP server available for free use by all academic users at the URL http://iimcb.genesilico.pl/MetaLocGramN. This article is part of a Special Issue entitled: Computational Methods for Protein Interaction and Structural Prediction.
Copyright © 2012 Elsevier B.V. All rights reserved.