Reconstructing ancestral gene content by coevolution

Genome Res. 2010 Jan;20(1):122-32. doi: 10.1101/gr.096115.109. Epub 2009 Nov 30.

Abstract

Inferring the gene content of ancestral genomes is a fundamental challenge in molecular evolution. Due to the statistical nature of this problem, ancestral genomes inferred by the maximum likelihood (ML) or the maximum-parsimony (MP) methods are prone to considerable error rates. In general, these errors are difficult to abolish by using longer genomic sequences or by analyzing more taxa. This study describes a new approach for improving ancestral genome reconstruction, the ancestral coevolver (ACE), which utilizes coevolutionary information to improve the accuracy of such reconstructions over previous approaches. The principal idea is to reduce the potentially large solution space by choosing a single optimal (or near optimal) solution that is in accord with the coevolutionary relationships between protein families. Simulation experiments, both on artificial and real biological data, show that ACE yields a marked decrease in error rate compared with ML or MP. Applied to a large data set (95 organisms, 4873 protein families, and 10,000 coevolutionary relationships), some of the ancestral genomes reconstructed by ACE were remarkably different in their gene content from those reconstructed by ML or MP alone (more than 10% in some nodes). These reconstructions, while having almost similar likelihood/parsimony scores as those obtained with ML/MP, had markedly higher concordance with the coevolutionary information. Specifically, when ACE was implemented to improve the results of ML, it added a large number of proteins to those encoded by LUCA (last universal common ancestor), most of them ribosomal proteins and components of the F(0)F(1)-type ATP synthase/ATPases, complexes that are vital in most living organisms. Our analysis suggests that LUCA appears to have been bacterial-like and had a genome size similar to the genome sizes of many extant organisms.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computer Simulation
  • Escherichia coli / genetics*
  • Escherichia coli / metabolism
  • Escherichia coli Proteins / genetics*
  • Escherichia coli Proteins / metabolism
  • Evolution, Molecular*
  • Fossils*
  • Genome, Bacterial*
  • Genome, Fungal*
  • Genome, Plant*
  • Plant Proteins / genetics
  • Plant Proteins / metabolism
  • Saccharomyces cerevisiae / genetics
  • Saccharomyces cerevisiae / metabolism
  • Saccharomyces cerevisiae Proteins / genetics*
  • Saccharomyces cerevisiae Proteins / metabolism
  • Trees / genetics

Substances

  • Escherichia coli Proteins
  • Plant Proteins
  • Saccharomyces cerevisiae Proteins