Error detection in SNP data by considering the likelihood of recombinational history implied by three-site combinations

Bioinformatics. 2007 Jul 15;23(14):1807-14. doi: 10.1093/bioinformatics/btm260. Epub 2007 May 17.

Abstract

Motivation: Errors in nucleotide sequence and SNP genotyping data are problematic when inferring haplotypes. Previously published methods for error detection in haplotype data make use of pedigree information; however, for many samples, individuals are not related by pedigree. This article describes a method for detecting errors in haplotypes by considering the recombinational history implied by the patterns of variation, three SNPs at a time.

Results: Coalescent simulations provide evidence that the method is robust to high levels of recombination as well as homologous gene conversion, indicating that patterns produced by both proximate and distant SNPs may be useful for detecting unlikely three-site haplotypes.

Availability: The perl script implementing the described method is called EDUT (Error Detection Using Triplets) and is available on request from the authors.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Cloning, Molecular
  • Computational Biology / methods*
  • Computer Simulation
  • Genotype
  • Haplotypes
  • Heterozygote
  • Humans
  • Likelihood Functions
  • Models, Biological
  • Models, Genetic
  • Pedigree
  • Polymorphism, Genetic
  • Polymorphism, Single Nucleotide*
  • Software