TreeFix: statistically informed gene tree error correction using species trees

Syst Biol. 2013 Jan 1;62(1):110-20. doi: 10.1093/sysbio/sys076. Epub 2012 Sep 4.

Abstract

Accurate gene tree reconstruction is a fundamental problem in phylogenetics, with many important applications. However, sequence data alone often lack enough information to confidently support one gene tree topology over many competing alternatives. Here, we present a novel framework for combining sequence data and species tree information, and we describe an implementation of this framework in TreeFix, a new phylogenetic program for improving gene tree reconstructions. Given a gene tree (preferably computed using a maximum-likelihood phylogenetic program), TreeFix finds a "statistically equivalent" gene tree that minimizes a species tree-based cost function. We have applied TreeFix to 2 clades of 12 Drosophila and 16 fungal genomes, as well as to simulated phylogenies and show that it dramatically improves reconstructions compared with current state-of-the-art programs. Given its accuracy, speed, and simplicity, TreeFix should be applicable to a wide range of analyses and have many important implications for future investigations of gene evolution. The source code and a sample data set are available at http://compbio.mit.edu/treefix.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Classification / methods*
  • Drosophila / classification
  • Drosophila / genetics
  • Fungi / classification
  • Fungi / genetics
  • Phylogeny*
  • Reproducibility of Results
  • Software*