Amino acid substitution matrices from an artificial neural network model

J Comput Biol. 2001;8(5):471-81. doi: 10.1089/106652701753216495.

Abstract

An amino acid substitution matrix specifies probabilities of substitutions for each pair of the 20 amino acids. Log-odds scores transformed from the values in substitution matrices are widely used to construct protein sequence alignments. Any given substitution matrix is suited to matching sequences diverged by a specific evolutionary distance. However, for a given set of sequences, it is not always clear what matrix should be used. We used an artificial neural network model to predict probabilities of amino acid substitutions with alignment samples of different evolutionary distances. From this internal description, substitution matrices suitable for detecting relationships at any chosen evolutionary distance can be instantly generated. By using the additional information of evolutionary distances, the average cross entropy error of our neural network model is lower than that of a series of BLOSUM and PET matrices over all testing sets. Our model is more accurate on the prediction of amino acid substitution probabilities.

MeSH terms

  • Amino Acid Substitution*
  • Models, Molecular*
  • Neural Networks, Computer*
  • Proteins*

Substances

  • Proteins