Entropies of coding and noncoding sequences of DNA and proteins

Biophys Chem. 1992 Jan;42(1):7-11. doi: 10.1016/0301-4622(92)80002-m.

Abstract

The entropies of protein coding genes from Escherichia coli were calculated according to Boltzmann's formula. Entropies of the coding regions were compared to the entropies of noncoding or miscoding ones. With nucleotides as code units, the entropies of the coding regions, when compared to the entropies of complete sequences (leader and coding region as well as trailer), were seen to be lower but with a marginal statistical significance. With triplets of nucleotides as code units, the entropies of correct reading frames were significantly lower than the entropies of frameshifts +1 and -1. With amino acids as code units, the results were opposite: Biologically functional proteins had significantly higher entropies than proteins translated from the frameshifted sequences. We attempt to explain this paradox with the hypothesis that the genetic code may have the ability of lowering information content (increasing entropy) of proteins while translating them from DNA. This ability might be beneficial to bacteria because it would make the functional proteins more probable (having a higher entropy) than nonfunctional proteins translated from frameshifted sequences.

MeSH terms

  • Amino Acids / genetics
  • Bacterial Proteins / genetics*
  • DNA, Bacterial / analysis
  • DNA, Bacterial / genetics*
  • Escherichia coli / genetics*
  • Genes, Bacterial*
  • Protein Biosynthesis
  • Statistics as Topic
  • Thermodynamics

Substances

  • Amino Acids
  • Bacterial Proteins
  • DNA, Bacterial