Prediction of mitochondrial proteins based on genetic algorithm - partial least squares and support vector machine

Amino Acids. 2007 Nov;33(4):669-75. doi: 10.1007/s00726-006-0465-0. Epub 2007 Aug 15.

Abstract

Mitochondria are essential cell organelles of eukaryotes. Hence, it is vitally important to develop an automated and reliable method for timely identification of novel mitochondrial proteins. In this study, mitochondrial proteins were encoded by dipeptide composition technology; then, the genetic algorithm-partial least square (GA-PLS) method was used to evaluate the dipeptide composition elements which are more important in recognizing mitochondrial proteins; further, these selected dipeptide composition elements were applied to support vector machine (SVM)-based classifiers to predict the mitochondrial proteins. All the models were trained and validated by the jackknife cross-validation test. The prediction accuracy is 85%, suggesting that it performs reasonably well in predicting the mitochondrial proteins. Our results strongly imply that not all the dipeptide compositions are informative and indispensable for predicting proteins. The source code of MATLAB and the dataset are available on request under [email protected].

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Computational Biology / methods*
  • Dipeptides / analysis*
  • Dipeptides / chemistry
  • Mitochondrial Proteins / chemistry*

Substances

  • Dipeptides
  • Mitochondrial Proteins