Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes

Genome Res. 2003 May;13(5):813-20. doi: 10.1101/gr.1064503.

Abstract

Comparative sequence analyses on a collection of carefully chosen mammalian genomes could facilitate identification of functional elements within the human genome and allow quantification of evolutionary constraint at the single nucleotide level. High-resolution quantification would be informative for determining the distribution of important positions within functional elements and for evaluating the relative importance of nucleotide sites that carry single nucleotide polymorphisms (SNPs). Because the level of resolution in comparative sequence analyses is a direct function of sequence diversity, we propose that the information content of a candidate mammalian genome be defined as the sequence divergence it would add relative to already-sequenced genomes. We show that reliable estimates of genomic sequence divergence can be obtained from small genomic regions. On the basis of a multiple sequence alignment of approximately 1.4 megabases each from eight mammals, we generate such estimates for five unsequenced mammals. Estimates of the neutral divergence in these data suggest that a small number of diverse mammalian genomes in addition to human, mouse, and rat would allow single nucleotide resolution in comparative sequence analyses.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Animals
  • Cats
  • Cattle
  • Computational Biology / methods
  • Computational Biology / statistics & numerical data
  • Cystic Fibrosis Transmembrane Conductance Regulator / genetics
  • DNA / genetics
  • Dogs
  • Evolution, Molecular*
  • Genome*
  • Genome, Human
  • Humans
  • Mice
  • Mutagenesis / genetics
  • Pan troglodytes / genetics
  • Papio / genetics
  • Polymorphism, Single Nucleotide / genetics
  • Rats
  • Sequence Alignment / methods
  • Sequence Alignment / statistics & numerical data
  • Sequence Analysis, DNA / methods*
  • Swine / genetics

Substances

  • CFTR protein, human
  • Cystic Fibrosis Transmembrane Conductance Regulator
  • DNA