A set X of 20 trinucleotides was identified in genes of bacteria, eukaryotes, plasmids and viruses, which has in average the highest occurrence in reading frame compared to its two shifted frames (Michel, 2015; Arquès and Michel, 1996). This set X has an interesting mathematical property as X is a circular code (Arquès and Michel, 1996). Thus, the motifs from this circular code X, called X motifs, have the property to always retrieve, synchronize and maintain the reading frame in genes. In this paper, we develop several statistical analyzes of X motifs in 138 available complete genomes of eukaryotes in which genes as well as non-gene regions are examined. Large X motifs (with lengths of at least 15 consecutive trinucleotides of X and compositions of at least 10 different trinucleotides of X among 20) have the highest occurrence in genomes of eukaryotes compared to its 23 large bijective motifs, its two large permuted motifs and large random motifs. The largest X motifs identified in eukaryotic genomes are presented, e.g. an X motif in a non-gene region of the genome Solanum pennellii with a length of 155 trinucleotides (465 nucleotides) and an expectation E=10(-71). In the human genome, the largest X motif occurs in a non-gene region of the chromosome 13 with a length of 36 trinucleotides and an expectation E=10(-11). X motifs in non-gene regions of genomes could be evolutionary relics of primitive genes using the circular code for translation. However, the proportion of X motifs (with lengths of at least 10 consecutive trinucleotides of X and compositions of at least 5 different trinucleotides of X among 20) in genes/non-genes of the 138 complete eukaryotic genomes is about 8. Thus, the X motifs occur preferentially in genes, as expected from the previous works of 20 years.
Keywords: Bijective transformation circular code; Circular code; Circular code motifs; Genomes of eukaryotes; Permuted circular code.
Copyright © 2016 Elsevier Ltd. All rights reserved.