Biological origins of long-range correlations and compositional variations in DNA

Nucleic Acids Res. 1993 Nov 11;21(22):5167-70. doi: 10.1093/nar/21.22.5167.

Abstract

The occurrence of certain long-range correlations between nucleotides in DNA sequences of living organisms has recently been reported. The biological origin of these correlations was unknown. The correlations were proposed to be concerned with fractal structure and differences between intron-containing and intron-less sequences. We and others have reported that no consistent difference exists between intron-containing and intron-less sequences. In agreement with this, we demonstrate here that the long-range correlations are trivially equivalent to the varying ratio R between pyrimidines and purines (or any other nucleotide combinations) in different regions of a DNA sequence. Moreover, we show that this variation of R has simple biological explanations: Differences in base composition occur along most DNA sequences and are associated with (i) simple repeats (ii) differences in codon composition (due to the amino acid composition in the encoded protein), (iii) change of the direction of transcription (and thus also translation), and (iv) differences between protein- and rRNA-encoding segments. Seven biological examples are given.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Bacteriophage lambda / genetics
  • Base Composition
  • Base Sequence
  • DNA / genetics*
  • Genetic Variation*
  • Humans
  • Introns
  • Molecular Sequence Data
  • Myosins / genetics

Substances

  • DNA
  • Myosins