Using population-specific add-on polymorphisms to improve genotype imputation in underrepresented populations

PLoS Comput Biol. 2022 Jan 13;18(1):e1009628. doi: 10.1371/journal.pcbi.1009628. eCollection 2022 Jan.

Abstract

Genome-wide association studies rely on the statistical inference of untyped variants, called imputation, to increase the coverage of genotyping arrays. However, the results are often suboptimal in populations underrepresented in existing reference panels and array designs, since the selected single nucleotide polymorphisms (SNPs) may fail to capture population-specific haplotype structures, hence the full extent of common genetic variation. Here, we propose to sequence the full genomes of a small subset of an underrepresented study cohort to inform the selection of population-specific add-on tag SNPs and to generate an internal population-specific imputation reference panel, such that the remaining array-genotyped cohort could be more accurately imputed. Using a Tanzania-based cohort as a proof-of-concept, we demonstrate the validity of our approach by showing improvements in imputation accuracy after the addition of our designed add-on tags to the base H3Africa array.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods
  • Genetics, Population* / methods
  • Genetics, Population* / standards
  • Genome-Wide Association Study* / methods
  • Genome-Wide Association Study* / standards
  • Genotype*
  • Humans
  • Male
  • Polymorphism, Single Nucleotide / genetics*
  • Tanzania

Grants and funding

This work was supported by the Swiss National Science Foundation (https://www.snf.ch; Grant No: CRSII5_177163, 310030_188888) and the European Research Council (https://erc.europa.eu/; Grant No: 883582). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.