Here, we demonstrate improvements to our bioinformatic pipeline, PING, which provides high-resolution genotyping of killer-cell immunoglobulin-like receptor (KIR) sequencing data, that expand the method to provide KIR interpretation from whole genome sequencing (WGS) data. We evaluated performance using synthetic sequence datasets and real-world data from the 1000 Genomes Project (1KGP). PING demonstrated high exonic genotyping performance on the synthetic sequence dataset meant to approximate real-world data at 95% accuracy (N = 1366). This result was mirrored in the analysis of 1KGP European data (N = 215) with most genes showing near or below 5% frequency of unresolved exonic genotypes, which is an important indicator for genotyping errors in real-world data. An analysis into the distributions of genotyping errors for the synthetic sequence datasets gave insights into how to further improve genotype accuracy. Similarly, an analysis into ambiguous exonic genotype frequencies for the 1KGP European data, which showed high rates of unresolved genotypes, highlighted that an effective phasing method will be an impactful future additional to the PING workflow. Together, these results demonstrate that PING can effectively provide high-resolution KIR genotyping on WGS data.
Keywords: KIR; NGS; bioinformatics pipelines; copy number; genotyping; immunogenetics; natural killer cells; variant calling.
© 2022 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.