Modeling gene and genome duplications in eukaryotes

Proc Natl Acad Sci U S A. 2005 Apr 12;102(15):5454-9. doi: 10.1073/pnas.0501102102. Epub 2005 Mar 30.

Abstract

Recent analysis of complete eukaryotic genome sequences has revealed that gene duplication has been rampant. Moreover, next to a continuous mode of gene duplication, in many eukaryotic organisms the complete genome has been duplicated in their evolutionary past. Such large-scale gene duplication events have been associated with important evolutionary transitions or major leaps in development and adaptive radiations of species. Here, we present an evolutionary model that simulates the duplication dynamics of genes, considering genome-wide duplication events and a continuous mode of gene duplication. Modeling the evolution of the different functional categories of genes assesses the importance of different duplication events for gene families involved in specific functions or processes. By applying our model to the Arabidopsis genome, for which there is compelling evidence for three whole-genome duplications, we show that gene loss is strikingly different for large-scale and small-scale duplication events and highly biased toward certain functional classes. We provide evidence that some categories of genes were almost exclusively expanded through large-scale gene duplication events. In particular, we show that the three whole-genome duplications in Arabidopsis have been directly responsible for >90% of the increase in transcription factors, signal transducers, and developmental genes in the last 350 million years. Our evolutionary model is widely applicable and can be used to evaluate different assumptions regarding small- or large-scale gene duplication events in eukaryotic genomes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Arabidopsis / genetics*
  • Bias
  • Eukaryotic Cells / metabolism*
  • Evolution, Molecular*
  • Gene Duplication*
  • Genes, Duplicate / genetics*
  • Genes, Plant / genetics
  • Genome, Plant*
  • Models, Genetic*
  • Population Dynamics
  • Sequence Homology, Nucleic Acid
  • Time Factors