Consensus alignment for reliable framework prediction in homology modeling

Bioinformatics. 2003 Sep 1;19(13):1682-91. doi: 10.1093/bioinformatics/btg211.

Abstract

Motivation: Even the best sequence alignment methods frequently fail to correctly identify the framework regions for which backbones can be copied from the template into the target structure. Since the underprediction and, more significantly, the overprediction of these regions reduces the quality of the final model, it is of prime importance to attain as much as possible of the true structural alignment between target and template.

Results: We have developed an algorithm called Consensus that consistently provides a high quality alignment for comparative modeling. The method follows from a benchmark analysis of the 3D models generated by ten alignment techniques for a set of 79 homologous protein structure pairs. For 20-to-40% of the targets, these methods yield models with at least 6 A root mean square deviation (RMSD) from the native structure. We have selected the top five performing methods, and developed a consensus algorithm to generate an improved alignment. By building on the individual strength of each method, a set of criteria was implemented to remove the alignment segments that are likely to correspond to structurally dissimilar regions. The automated algorithm was validated on a different set of 48 protein pairs, resulting in 2.2 A average RMSD for the predicted models, and only four cases in which the RMSD exceeded 3 A. The average length of the alignments was about 75% of that found by standard structural superposition methods. The performance of Consensus was consistent from 2 to 32% target-template sequence identity, and hence it can be used for accurate prediction of framework regions in homology modeling.

Publication types

  • Evaluation Study
  • Research Support, U.S. Gov't, P.H.S.
  • Validation Study

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Consensus Sequence*
  • Gene Expression Profiling / methods*
  • Models, Molecular*
  • Molecular Sequence Data
  • Proteins / chemistry*
  • Proteins / classification
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*
  • Sequence Homology, Amino Acid*
  • Software

Substances

  • Proteins