Circular code motifs in genomes of eukaryotes

J Theor Biol. 2016 Nov 7:408:198-212. doi: 10.1016/j.jtbi.2016.07.022. Epub 2016 Jul 19.

Abstract

A set X of 20 trinucleotides was identified in genes of bacteria, eukaryotes, plasmids and viruses, which has in average the highest occurrence in reading frame compared to its two shifted frames (Michel, 2015; Arquès and Michel, 1996). This set X has an interesting mathematical property as X is a circular code (Arquès and Michel, 1996). Thus, the motifs from this circular code X, called X motifs, have the property to always retrieve, synchronize and maintain the reading frame in genes. In this paper, we develop several statistical analyzes of X motifs in 138 available complete genomes of eukaryotes in which genes as well as non-gene regions are examined. Large X motifs (with lengths of at least 15 consecutive trinucleotides of X and compositions of at least 10 different trinucleotides of X among 20) have the highest occurrence in genomes of eukaryotes compared to its 23 large bijective motifs, its two large permuted motifs and large random motifs. The largest X motifs identified in eukaryotic genomes are presented, e.g. an X motif in a non-gene region of the genome Solanum pennellii with a length of 155 trinucleotides (465 nucleotides) and an expectation E=10(-71). In the human genome, the largest X motif occurs in a non-gene region of the chromosome 13 with a length of 36 trinucleotides and an expectation E=10(-11). X motifs in non-gene regions of genomes could be evolutionary relics of primitive genes using the circular code for translation. However, the proportion of X motifs (with lengths of at least 10 consecutive trinucleotides of X and compositions of at least 5 different trinucleotides of X among 20) in genes/non-genes of the 138 complete eukaryotic genomes is about 8. Thus, the X motifs occur preferentially in genes, as expected from the previous works of 20 years.

Keywords: Bijective transformation circular code; Circular code; Circular code motifs; Genomes of eukaryotes; Permuted circular code.

MeSH terms

  • DNA, Circular
  • Eukaryota / genetics*
  • Genome / genetics
  • Nucleotide Motifs / genetics*
  • Reading Frames / genetics

Substances

  • DNA, Circular