Genetic matching for loci in the human leukocyte antigen (HLA) region between a donor and a patient in hematopoietic stem cell transplantation (HSCT) is critical to outcome; however, methods for HLA genotyping of donors in unrelated stem cell registries often yield results with allelic and phase ambiguity and/or do not query all clinically relevant loci. We present and evaluate a statistical method for in silico imputation of HLA alleles and haplotypes in large ambiguous population data from the Be The Match(®) Registry. Our method builds on haplotype frequencies estimated from registry populations and exploits patterns of linkage disequilibrium (LD) across HLA haplotypes to infer high resolution HLA assignments. We performed validation on simulated and real population data from the Registry with non-trivial ambiguity content. While real population datasets caused some predictions to deviate from expectation, validations still showed high percent recall for imputed results with average recall >76% when imputing HLA alleles from registry data. We simulated ambiguity generated by several HLA genotyping methods to evaluate the imputation performance on several levels of typing resolution. On average, imputation percent recall of allele-level HLA haplotypes was >95% for allele-level typing, >92% for intermediate resolution typing and >58% for serology (low-resolution) typing. Thus, allele-level HLA assignments can be imputed through the application of a set of statistical and population genetics inferences and with knowledge of haplotype frequencies and self-identified race and ethnicities.
Keywords: expectation maximization; human leukocyte antigen; imputation; maximum likelihood; typing ambiguity; typing resolution.
© 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.