Protein classification using sequential pattern mining

Conf Proc IEEE Eng Med Biol Soc. 2006:2006:5814-7. doi: 10.1109/IEMBS.2006.260336.

Abstract

Protein classification in terms of fold recognition can be employed to determine the structural and functional properties of a newly discovered protein. In this work sequential pattern mining (SPM) is utilized for sequence-based fold recognition. One of the most efficient SPM algorithms, cSPADE, is employed for protein primary structure analysis. Then a classifier uses the extracted sequential patterns for classifying proteins of unknown structure in the appropriate fold category. The proposed methodology exhibited an overall accuracy of 36% in a multi-class problem of 17 candidate categories. The classification performance reaches up to 65% when the three most probable protein folds are considered.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Computer Simulation
  • Databases, Protein*
  • Humans
  • Markov Chains
  • Models, Genetic
  • Models, Molecular
  • Models, Statistical
  • Neural Networks, Computer
  • Pattern Recognition, Automated*
  • Protein Conformation
  • Protein Folding
  • Proteins / chemistry*
  • Sequence Analysis, Protein

Substances

  • Proteins