Mimicking cellular sorting improves prediction of subcellular localization

J Mol Biol. 2005 Apr 22;348(1):85-100. doi: 10.1016/j.jmb.2005.02.025.

Abstract

Predicting the native subcellular compartment of a protein is an important step toward elucidating its function. Here we introduce LOCtree, a hierarchical system combining support vector machines (SVMs) and other prediction methods. LOCtree predicts the subcellular compartment of a protein by mimicking the mechanism of cellular sorting and exploiting a variety of sequence and predicted structural features in its input. Currently LOCtree does not predict localization for membrane proteins, since the compositional properties of membrane proteins significantly differ from those of non-membrane proteins. While any information about function can be used by the system, we present estimates of performance that are valid when only the amino acid sequence of a protein is known. When evaluated on a non-redundant test set, LOCtree achieved sustained levels of 74% accuracy for non-plant eukaryotes, 70% for plants, and 84% for prokaryotes. We rigorously benchmarked LOCtree in comparison to the best alternative methods for localization prediction. LOCtree outperformed all other methods in nearly all benchmarks. Localization assignments using LOCtree agreed quite well with data from recent large-scale experiments. Our preliminary analysis of a few entirely sequenced organisms, namely human (Homo sapiens), yeast (Saccharomyces cerevisiae), and weed (Arabidopsis thaliana) suggested that over 35% of all non-membrane proteins are nuclear, about 20% are retained in the cytosol, and that every fifth protein in the weed resides in the chloroplast.

MeSH terms

  • Animals
  • Chloroplasts / chemistry
  • Computer Simulation*
  • Humans
  • Plant Proteins / chemistry
  • Plant Proteins / metabolism
  • Proteins / chemistry*
  • Proteins / metabolism*
  • Proteome / analysis
  • Reproducibility of Results
  • Subcellular Fractions / chemistry*
  • Subcellular Fractions / metabolism

Substances

  • Plant Proteins
  • Proteins
  • Proteome