Protein structure and fold prediction using Tree-Augmented naïve Bayesian classifier

J Bioinform Comput Biol. 2005 Aug;3(4):803-19. doi: 10.1142/s0219720005001302.

Abstract

Due to the large volume of protein sequence data, computational methods to determine the structure class and the fold class of a protein sequence have become essential. Several techniques based on sequence similarity, Neural Networks, Support Vector Machines (SVMs), etc. have been applied. Since most of these classifiers use binary classifiers for multi-classification, there may be (N) c2 classifiers required. This paper presents a framework using the Tree-Augmented Bayesian Networks (TAN) which performs multi-classification based on the theory of learning Bayesian Networks and using improved feature vector representation of (Ding et al., 2001). In order to enhance TAN's performance, pre-processing of data is done by feature discretization and post-processing is done by using Mean Probability Voting (MPV) scheme. The advantage of using Bayesian approach over other learning methods is that the network structure is intuitive. In addition, one can read off the TAN structure probabilities to determine the significance of each feature (say, hydrophobicity) for each class, which helps to further understand the complexity in protein structure. The experiments on the datasets used in three prominent recent works show that our approach is more accurate than other discriminative methods. The framework is implemented on the BAYESPROT web server and it is available at http://www-appn.comp.nus.edu.sg/~bioinfo/bayesprot/Default.htm. More detailed results are also available on the above website.

Publication types

  • Comparative Study
  • Evaluation Study

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Artificial Intelligence*
  • Bayes Theorem
  • Computer Simulation
  • Models, Molecular*
  • Molecular Sequence Data
  • Pattern Recognition, Automated / methods*
  • Protein Conformation
  • Protein Folding
  • Proteins / analysis*
  • Proteins / chemistry*
  • Proteins / classification
  • Sequence Analysis, Protein / methods*
  • Structure-Activity Relationship

Substances

  • Proteins