Linguistics of nucleotide sequences: morphology and comparison of vocabularies

J Biomol Struct Dyn. 1986 Aug;4(1):11-21. doi: 10.1080/07391102.1986.10507643.

Abstract

The concept of "words" in continuous languages devoid of blanks is introduced and an operational definition of words given. With this novel concept nucleotide sequences become object for linguistic analysis. The typical word size of the nucleotide language is found to be 3 to 5 (tri- to pentamers). Different genomes have distinct vocabularies. Comparison of these vocabularies can serve as a basis for revealing functional and evolutionary relatedness of sequences.

Publication types

  • Comparative Study

MeSH terms

  • Bacteriophages / genetics
  • Base Sequence*
  • DNA* / genetics
  • Escherichia coli / genetics
  • Genetic Code*

Substances

  • DNA