Calculation of IBD probabilities with dense SNP or sequence data

Genet Epidemiol. 2008 Sep;32(6):513-9. doi: 10.1002/gepi.20324.

Abstract

The probabilities that two individuals share 0, 1, or 2 alleles identical by descent (IBD) at a given genotyped marker locus are quantities of fundamental importance for disease gene and quantitative trait mapping and in family-based tests of association. Until recently, genotyped markers were sufficiently sparse that founder haplotypes could be modelled as having been drawn from a population in linkage equilibrium for the purpose of estimating IBD probabilities. However, with the advent of high-throughput single nucleotide polymorphism genotyping assays, this is no longer a reasonable assumption. Indeed, the imminent arrival of individual sequencing will enable high-density single nucleotide polymorphism genotyping on a scale for which current algorithms are not equipped. In this paper, we present a simple new model in which founder haplotypes are modelled as a Markov chain. Another important innovation is that genotyping errors are explicitly incorporated into the model. We compare results obtained using the new model to those obtained using the popular genetic linkage analysis package Merlin, with and without using the cluster model of linkage disequilibrium that is incorporated into that program. We find that the new model results in accuracy approaching that of Merlin with haplotype blocks, but achieves this with orders of magnitude faster run times. Moreover, the new algorithm scales linearly with number of markers, irrespective of density, whereas Merlin scales supralinearly. We also confirm a previous finding that ignoring linkage disequilibrium in founder haplotypes can cause errors in the calculation of IBD probabilities.

Publication types

  • Research Support, Non-U.S. Gov't
  • Twin Study

MeSH terms

  • Algorithms
  • Alleles
  • Cluster Analysis
  • Computer Simulation
  • Genetic Markers
  • Haplotypes
  • Humans
  • Linkage Disequilibrium
  • Markov Chains
  • Models, Genetic*
  • Models, Statistical*
  • Polymorphism, Single Nucleotide*
  • Quantitative Trait, Heritable

Substances

  • Genetic Markers