Data mining: Efficiency of using sequence databases for polymorphism discovery

Hum Mutat. 2001 Feb;17(2):141-50. doi: 10.1002/1098-1004(200102)17:2<141::AID-HUMU6>3.0.CO;2-1.

Abstract

An open question in research on Single Nucleotide Polymorphisms (SNPs) is, what is the percentage of true SNPs found by in silico pre-screening? To this end, we selected 13 genes, and determined the complete collection of "true" polymorphisms, or polymorphisms experimentally detected, existing in these genes in our laboratory using Denaturing High Performance Liquid Chromatography (DHPLC) and fluorescent sequencing, or in other laboratories using comparable methods. The genes studied by our group were PTGS2, IGFBP1, IGFBP3, and CYP19. GenBank sequence information was then aligned using two methods, and sequence differences termed "candidate" polymorphisms. We then compared the series of SNPs obtained experimentally and in silico and we have found that in silico methods are relatively specific (up to 55% of candidate SNPs found by SNPFinder have been discovered by experimental procedure) but have low sensitivity (not more than 27% of true SNPs are found by in silico methods).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aromatase / genetics
  • Chromatography, High Pressure Liquid / methods
  • Cyclooxygenase 2
  • DNA / chemistry
  • DNA / genetics
  • Databases, Factual*
  • Expressed Sequence Tags
  • Humans
  • Information Storage and Retrieval
  • Insulin-Like Growth Factor Binding Protein 1 / genetics
  • Insulin-Like Growth Factor Binding Protein 3 / genetics
  • Isoenzymes / genetics
  • Membrane Proteins
  • Polymorphism, Genetic / genetics*
  • Polymorphism, Single Nucleotide / genetics
  • Prostaglandin-Endoperoxide Synthases / genetics
  • Sequence Alignment
  • Sequence Analysis, DNA / methods

Substances

  • Insulin-Like Growth Factor Binding Protein 1
  • Insulin-Like Growth Factor Binding Protein 3
  • Isoenzymes
  • Membrane Proteins
  • DNA
  • Aromatase
  • Cyclooxygenase 2
  • PTGS2 protein, human
  • Prostaglandin-Endoperoxide Synthases