Evolutionary simulations to detect functional lineage-specific genes

Bioinformatics. 2006 Aug 1;22(15):1815-22. doi: 10.1093/bioinformatics/btl280. Epub 2006 Jun 9.

Abstract

Motivation: Supporting the functionality of recent duplicate gene copies is usually difficult, owing to high sequence similarity between duplicate counterparts and shallow phylogenies, which hamper both the statistical and experimental inference.

Results: We developed an integrated evolutionary approach to identify functional duplicate gene copies and other lineage-specific genes. By repeatedly simulating neutral evolution, our method estimates the probability that an ORF was selectively conserved and is therefore likely to represent a bona fide coding region. In parallel, our method tests whether the accumulation of non-synonymous substitutions reveals signatures of selective constraint. We show that our approach has high power to identify functional lineage-specific genes using simulated and real data. For example, a coding region of average length (approximately 1400 bp), restricted to hominoids, can be predicted to be functional in approximately 94-100% of cases. Notably, the method may support functionality for instances where classical selection tests based on the ratio of non-synonymous to synonymous substitutions fail to reveal signatures of selection. Our method is available as an automated tool, ReEVOLVER, which will also be useful to systematically detect functional lineage-specific genes of closely related species on a large scale.

Availability: ReEVOLVER is available at http://www.unil.ch/cig/page7858.html.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • Cell Lineage / genetics*
  • Chromosome Mapping / methods*
  • Conserved Sequence
  • DNA Mutational Analysis / methods
  • Evolution, Molecular*
  • Hominidae / genetics*
  • Humans
  • Linkage Disequilibrium / genetics*
  • Molecular Sequence Data
  • Open Reading Frames / genetics*
  • Phylogeny
  • Sequence Alignment / methods
  • Sequence Analysis, DNA / methods*
  • Sequence Homology, Nucleic Acid