A computational method for resequencing long DNA targets by universal oligonucleotide arrays

Proc Natl Acad Sci U S A. 2002 Nov 26;99(24):15492-6. doi: 10.1073/pnas.232278299. Epub 2002 Nov 12.

Abstract

Universal arrays contain all possible oligonucleotides of a certain length, typically 6-10 bases. They can determine in a single experiment all substrings of that length that occur along a target sequence. That information, also called the spectrum of the sequence, is not sufficient to uniquely reconstruct a sequence longer than a few hundred bases. We have devised a polynomial algorithm that reconstructs the sequence, given the spectrum and an additional reference sequence, homologous to the target sequence. Such a reference is available, for example, in the identification of single-nucleotide polymorphisms. The algorithm can handle errors in the spectrum as well as substitutions, insertions, and deletions in the target sequence. We present extensive simulation results, which show that the algorithm correctly reconstructs target sequences of >2,000 nucleotides from error-prone 8-mer spectra when realistic levels of single-nucleotide polymorphisms are present.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computer Simulation
  • DNA / genetics*
  • Nucleic Acid Hybridization
  • Oligonucleotide Array Sequence Analysis / methods*
  • Sequence Analysis, DNA / methods*
  • Software

Substances

  • DNA