Neighboring-nucleotide effects on the rates of germ-line single-base-pair substitution in human genes

Am J Hum Genet. 1998 Aug;63(2):474-88. doi: 10.1086/301965.

Abstract

The spectrum of single-base-pair substitutions logged in The Human Gene Mutation Database (HGMD), comprising 7,271 different lesions in the coding regions of 547 different human genes, was analyzed for nearest-neighbor effects on relative mutation rates. Owing to its retrospective nature, HGMD allows mutation rates to be estimated only in relative terms. Therefore, a novel methodology was devised in order to obtain these estimates in iterative fashion, correcting, at the same time, for the confounding effects of differential codon usage and for the fact that different types of amino acid replacement come to clinical attention with different probabilities. Over and above the hypermutability of CpG dinucleotides, reflected in transition rates five times the base mutation rate, only a subtle and locally confined influence of the surrounding DNA sequence on relative single-base-pair substitution rates was observed, which extended no farther than 2 bp from the substitution site. A disparity between the two DNA strands was evidenced by the fact that, when substitution rates were estimated conditional on the 5' and 3' flanking nucleotides, a significant rate difference emerged for 10 of 96 possible pairs of complementary substitutional events. Mutational bias, favoring substitutions toward flanking bases, a phenomenon reminiscent of misalignment mutagenesis, was apparent and exhibited both directionality and reading-frame sensitivity. No specific preponderance of repeat-sequence motifs was observed in the vicinity of nucleotide substitutions, but a moderate correlation between the relative mutability and thermodynamic stability of DNA triplets emerged, suggesting either inefficient DNA replication in regions of high stability or the transient stabilization of misaligned intermediates.

MeSH terms

  • Base Pairing*
  • Base Sequence*
  • DNA / chemistry
  • DNA / genetics
  • Databases as Topic
  • Genetic Diseases, Inborn / genetics*
  • Germ-Line Mutation*
  • Humans
  • Likelihood Functions
  • Models, Genetic*
  • Point Mutation*
  • Regression Analysis
  • Reproducibility of Results
  • Thermodynamics

Substances

  • DNA