High-accuracy haplotype imputation using unphased genotype data as the references

Wenzhi Li; Wei Xu; Guoxing Fu; Li Ma; Jendai Richards; Weinian Rao; Tameka Bythwood; Shiwen Guo; Qing Song

doi:10.1016/j.gene.2015.07.082

High-accuracy haplotype imputation using unphased genotype data as the references

Gene. 2015 Nov 10;572(2):279-84. doi: 10.1016/j.gene.2015.07.082. Epub 2015 Jul 30.

Authors

Wenzhi Li¹, Wei Xu², Guoxing Fu³, Li Ma⁴, Jendai Richards², Weinian Rao³, Tameka Bythwood², Shiwen Guo⁵, Qing Song⁶

Affiliations

¹ Department of Neurosurgery, First Affiliated Hospital of Medical School, Xi'an Jiaotong University, Xi'an, Shaanxi, China; Cardiovascular Research Institute, Morehouse School of Medicine, Atlanta, GA, USA.
² Cardiovascular Research Institute, Morehouse School of Medicine, Atlanta, GA, USA.
³ 4DGenome Inc, Atlanta, GA, USA.
⁴ Cardiovascular Research Institute, Morehouse School of Medicine, Atlanta, GA, USA; 4DGenome Inc, Atlanta, GA, USA.
⁵ Department of Neurosurgery, First Affiliated Hospital of Medical School, Xi'an Jiaotong University, Xi'an, Shaanxi, China. Electronic address: [email protected].
⁶ Cardiovascular Research Institute, Morehouse School of Medicine, Atlanta, GA, USA; 4DGenome Inc, Atlanta, GA, USA; First Affiliated Hospital of Medical School, Xi'an Jiaotong University, Xi'an, Shaanxi, China. Electronic address: [email protected].

Abstract

Enormously growing genomic datasets present a new challenge on missing data imputation, a notoriously resource-demanding task. Haplotype imputation requires ethnicity-matched references. However, to date, haplotype references are not available for the majority of populations in the world. We explored to use existing unphased genotype datasets as references; if it succeeds, it will cover almost all of the populations in the world. The results showed that our HiFi software successfully yields 99.43% accuracy with unphased genotype references. Our method provides a cost-effective solution to breakthrough the bottleneck of limited reference availability for haplotype imputation in the big data era.

Keywords: Big data; Imputation; References.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Genetics, Population / standards*
Genome, Human
Haplotypes*
Humans
Models, Genetic
Polymorphism, Single Nucleotide
Sequence Analysis, DNA / economics
Sequence Analysis, DNA / standards*
Software

Abstract

Publication types

MeSH terms

Grants and funding