Remote homology detection using a kernel method that combines sequence and secondary-structure similarity scores

In Silico Biol. 2009;9(3):89-103.

Abstract

Distant evolutionary relationships between proteins with low sequence similarity are difficult to recognise by computational methods. Consequently, many sequences obtained from large-scale sequencing projects cannot be assigned to any known proteins or families despite being evolutionarily related. To boost sensitivity, various sequence-based methods have been modified to make use of the better conserved secondary structure. Most of these methods are instance-based or generative. Here, we introduce a kernel-based remote homology detection method that allows for a combination of sequence and secondary-structure similarity scores in a discriminative approach. We studied the ability of the method to predict superfamily membership as defined by the SCOP database. We show that a kernel method that combined sequence similarity scores with predicted secondary-structure similarity scores performed similar to a classifier that used scores calculated from sequences and true secondary structures, but performed better than a sequence-only based classifier and achieved a better mean than recently published results on the same data-set. It can be concluded that SVM classifiers trained to predict homology between distantly related proteins, become more accurate, if a joint sequence/secondary-structure similarity score approach is used.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Area Under Curve
  • Computational Biology / methods
  • Hemoglobins / chemistry
  • Humans
  • Models, Molecular
  • Myoglobin / chemistry
  • Protein Structure, Secondary*
  • Proteins / classification*
  • ROC Curve
  • Sequence Homology, Amino Acid*
  • Software

Substances

  • Hemoglobins
  • Myoglobin
  • Proteins