A comprehensive evaluation of SNP genotype imputation

Hum Genet. 2009 Mar;125(2):163-71. doi: 10.1007/s00439-008-0606-5. Epub 2008 Dec 17.

Abstract

Genome-wide association studies have contributed significantly to the genetic dissection of complex diseases. In order to increase the power of existing marker sets even further, methods have been proposed to predict individual genotypes at un-typed loci from other marker sets by imputation, usually employing HapMap data as a reference. Although various imputation algorithms have been used in practice already, a comprehensive evaluation and comparison of these approaches, using genome-wide SNP data from one and the same population is still lacking. We therefore investigated four publicly available programs for genotype imputation (BEAGLE, IMPUTE, MACH, and PLINK) using data from 449 German individuals genotyped in our laboratory for three genome-wide SNP sets [Affymetrix 5.0 (500 k), Affymetrix 6.0 (1,000 k), and Illumina 550 k]. We observed that HapMap-based imputation in a northern European population is powerful and reliable, even in highly variable genomic regions such as the extended MHC on chromosome 6p21. However, while genotype predictions were found to be highly accurate with all four programs, the number of SNPs for which imputation was actually carried out ('imputation efficacy') varied substantially. BEAGLE, IMPUTE, and MACH yielded nearly identical trade-offs between imputation accuracy and efficacy whereas PLINK performed consistently poorer. We nevertheless recommend either MACH or BEAGLE for practical use because these two programs are more user-friendly and generally require less memory than IMPUTE.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Female
  • Genetic Markers / genetics
  • Genetics, Population*
  • Genome-Wide Association Study / methods*
  • Genomics / methods*
  • Genotype
  • Germany
  • Humans
  • Linkage Disequilibrium
  • Male
  • Models, Genetic*
  • Polymorphism, Single Nucleotide / genetics*
  • Software*

Substances

  • Genetic Markers