An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited

J Mol Biol. 1995 Jun 16;249(4):816-31. doi: 10.1006/jmbi.1995.0340.

Abstract

The sensitivity of most protein sequence alignment methods depends strongly on the quality of the comparison matrices used. These matrices, which assign weights or similarity scores to every possible amino acid substitution pair, are utilized to differentiate amongst the various possible alignments of two or more sequences. There are many ways to generate these exchange weights and new matrices are constantly published. There has been no overall assessment of these various matrices when applied in different alignment techniques and over many protein folds and families, both close and distant and with the use of several gap penalty values. In this work, a set of amino acid sequences matched by superposition of known protein tertiary topologies is used to test the alignment accuracy of the different method/matrix/penalty combinations. The comparisons show relatively similar results for the top scoring matrices, a preference for the global alignment method of Needleman and Wunsch, and the importance of matrix modification and optimized gap penalties. The relationship between the percentage identity in a resulting alignment and the level of correctness to be expected are given for the top-performing matrix, resulting in a better definition of the so-called "twilight zone". Estimates are made for the probability that two sequences, aligned at a certain level of residue percentage identity, are in fact unrelated.

MeSH terms

  • Amino Acid Sequence
  • Amino Acids / chemistry
  • Databases, Factual
  • Molecular Sequence Data
  • Proteins / chemistry*
  • Sequence Alignment / methods*

Substances

  • Amino Acids
  • Proteins