Genomic signature: characterization and classification of species assessed by chaos game representation of sequences

Mol Biol Evol. 1999 Oct;16(10):1391-9. doi: 10.1093/oxfordjournals.molbev.a026048.

Abstract

We explored DNA structures of genomes by means of a new tool derived from the "chaotic dynamical systems" theory (the so-called chaos game representation [CGR]), which allows the depiction of frequencies of oligonucleotides in the form of images. Using CGR, we observe that subsequences of a genome exhibit the main characteristics of the whole genome, attesting to the validity of the genomic signature concept. Base concentrations, stretches (runs of complementary bases or purines/pyrimidines), and patches (over- or underexpressed words of various lengths) are the main factors explaining the variability observed among sequences. The distance between images may be considered a measure of phylogenetic proximity. Eukaryotes and prokaryotes can be identified merely on the basis of their DNA structures.

MeSH terms

  • Algorithms
  • Animals
  • Classification
  • Computer Simulation
  • DNA / analysis*
  • DNA / genetics
  • Evolution, Molecular
  • Genome*
  • Humans
  • Image Processing, Computer-Assisted
  • Phylogeny
  • Species Specificity

Substances

  • DNA