Detecting coevolution without phylogenetic trees? Tree-ignorant metrics of coevolution perform as well as tree-aware metrics

J Gregory Caporaso; Sandra Smit; Brett C Easton; Lawrence Hunter; Gavin A Huttley; Rob Knight

doi:10.1186/1471-2148-8-327

Detecting coevolution without phylogenetic trees? Tree-ignorant metrics of coevolution perform as well as tree-aware metrics

BMC Evol Biol. 2008 Dec 3:8:327. doi: 10.1186/1471-2148-8-327.

Authors

J Gregory Caporaso¹, Sandra Smit, Brett C Easton, Lawrence Hunter, Gavin A Huttley, Rob Knight

Affiliation

¹ Department of Chemistry and Biochemistry, University of Colorado at Boulder, Boulder, CO, USA. [email protected]

Abstract

Background: Identifying coevolving positions in protein sequences has myriad applications, ranging from understanding and predicting the structure of single molecules to generating proteome-wide predictions of interactions. Algorithms for detecting coevolving positions can be classified into two categories: tree-aware, which incorporate knowledge of phylogeny, and tree-ignorant, which do not. Tree-ignorant methods are frequently orders of magnitude faster, but are widely held to be insufficiently accurate because of a confounding of shared ancestry with coevolution. We conjectured that by using a null distribution that appropriately controls for the shared-ancestry signal, tree-ignorant methods would exhibit equivalent statistical power to tree-aware methods. Using a novel t-test transformation of coevolution metrics, we systematically compared four tree-aware and five tree-ignorant coevolution algorithms, applying them to myoglobin and myosin. We further considered the influence of sequence recoding using reduced-state amino acid alphabets, a common tactic employed in coevolutionary analyses to improve both statistical and computational performance.

Results: Consistent with our conjecture, the transformed tree-ignorant metrics (particularly Mutual Information) often outperformed the tree-aware metrics. Our examination of the effect of recoding suggested that charge-based alphabets were generally superior for identifying the stabilizing interactions in alpha helices. Performance was not always improved by recoding however, indicating that the choice of alphabet is critical.

Conclusion: The results suggest that t-test transformation of tree-ignorant metrics can be sufficient to control for patterns arising from shared ancestry.

Publication types

Comparative Study
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Computational Biology / methods*
Evolution, Molecular*
Models, Genetic
Models, Statistical*
Myoglobin / genetics
Myosins / genetics
Phylogeny*
Protein Structure, Secondary
Sequence Alignment
Sequence Analysis, Protein

Substances

Myoglobin
Myosins

Abstract

Publication types

MeSH terms

Substances

Grants and funding