Detecting coevolving amino acid sites using Bayesian mutational mapping

Bioinformatics. 2005 Jun:21 Suppl 1:i126-35. doi: 10.1093/bioinformatics/bti1032.

Abstract

Motivation: The evolution of protein sequences is constrained by complex interactions between amino acid residues. Because harmful substitutions may be compensated for by other substitutions at neighboring sites, residues can coevolve. We describe a Bayesian phylogenetic approach to the detection of coevolving residues in protein families. This method, Bayesian mutational mapping (BMM), assigns mutations to the branches of the evolutionary tree stochastically, and then test statistics are calculated to determine whether a coevolutionary signal exists in the mapping. Posterior predictive P-values provide an estimate of significance, and specificity is maintained by integrating over uncertainty in the estimation of the tree topology, branch lengths and substitution rates. A coevolutionary Markov model for codon substitution is also described, and this model is used as the basis of several test statistics.

Results: Results on simulated coevolutionary data indicate that the BMM method can successfully detect nearly all coevolving sites when the model has been correctly specified, and that non-parametric statistics such as mutual information are generally less powerful than parametric statistics. On a dataset of eukaryotic proteins from the phosphoglycerate kinase (PGK) family, interdomain site contacts yield a significantly greater coevolutionary signal than interdomain non-contacts, an indication that the method provides information about interacting sites. Failure to account for the heterogeneity in rates across sites in PGK resulted in a less discriminating test, yielding a marked increase in the number of reported positives at both contact and non-contact sites.

Supplementary information: http://www.dimmic.net/supplement/

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Amino Acids / chemistry*
  • Bayes Theorem
  • Binding Sites
  • Chromosome Mapping / methods*
  • Computational Biology / methods*
  • DNA Mutational Analysis*
  • Evolution, Molecular
  • Likelihood Functions
  • Markov Chains
  • Models, Statistical
  • Multigene Family
  • Mutation
  • Phosphoglycerate Kinase / genetics

Substances

  • Amino Acids
  • Phosphoglycerate Kinase