High-accuracy haplotype imputation using unphased genotype data as the references

Gene. 2015 Nov 10;572(2):279-84. doi: 10.1016/j.gene.2015.07.082. Epub 2015 Jul 30.

Abstract

Enormously growing genomic datasets present a new challenge on missing data imputation, a notoriously resource-demanding task. Haplotype imputation requires ethnicity-matched references. However, to date, haplotype references are not available for the majority of populations in the world. We explored to use existing unphased genotype datasets as references; if it succeeds, it will cover almost all of the populations in the world. The results showed that our HiFi software successfully yields 99.43% accuracy with unphased genotype references. Our method provides a cost-effective solution to breakthrough the bottleneck of limited reference availability for haplotype imputation in the big data era.

Keywords: Big data; Imputation; References.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genetics, Population / standards*
  • Genome, Human
  • Haplotypes*
  • Humans
  • Models, Genetic
  • Polymorphism, Single Nucleotide
  • Sequence Analysis, DNA / economics
  • Sequence Analysis, DNA / standards*
  • Software