Accounting for haplotype phase uncertainty in linkage disequilibrium estimation

Genet Epidemiol. 2008 Feb;32(2):168-78. doi: 10.1002/gepi.20273.

Abstract

The characterization of linkage disequilibrium (LD) is applied in a variety of studies including the identification of molecular determinants of the local recombination rate, the migration and population history of populations, and the role of positive selection in adaptation. LD suffers from the phase uncertainty of the haplotypes used in its calculation, which reflects limitations of the algorithms used for haplotype estimation. We introduce a LD calculation method, which deals with phase uncertainty by weighting all possible haplotype pairs according to their estimated probabilities as evaluated by PHASE. In contrast to the expectation-maximization (EM) algorithm as implemented in the HAPLOVIEW and GENETICS packages, our method considers haplotypes based on the entire genetic information available for the candidate region. We tested the method using simulated and real genotyping data. The results show that, for all practical purposes, the new method is advantageous in comparison with algorithms that calculate LD using only the most probable haplotype or bilocus haplotypes based on the EM algorithm. The new method deals especially well with low LD regions, which contribute strongly to phase uncertainty. Altogether, the method is an attractive alternative to standard LD calculation procedures, including those based on the EM algorithm. We implemented the method in the software suite R, together with an interface to the popular haplotype calculation package PHASE.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods
  • Computer Simulation
  • Gene Frequency
  • Genotype
  • Haplotypes*
  • Humans
  • Linkage Disequilibrium
  • Models, Genetic
  • Polymorphism, Single Nucleotide
  • Software
  • Validation Studies as Topic