Extent to which array genotyping and imputation with large reference panels approximate deep whole-genome sequencing

Sarah C Hanks; Lukas Forer; Sebastian Schönherr; Jonathon LeFaive; Taylor Martins; Ryan Welch; Sarah A Gagliano Taliun; David Braff; Jill M Johnsen; Eimear E Kenny; Barbara A Konkle; Markku Laakso; Ruth F J Loos; Steven McCarroll; Carlos Pato; Michele T Pato; Albert V Smith; NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium; Michael Boehnke; Laura J Scott; Christian Fuchsberger

doi:10.1016/j.ajhg.2022.07.012

Extent to which array genotyping and imputation with large reference panels approximate deep whole-genome sequencing

Am J Hum Genet. 2022 Sep 1;109(9):1653-1666. doi: 10.1016/j.ajhg.2022.07.012. Epub 2022 Aug 17.

Authors

Sarah C Hanks¹, Lukas Forer², Sebastian Schönherr², Jonathon LeFaive¹, Taylor Martins¹, Ryan Welch¹, Sarah A Gagliano Taliun³, David Braff⁴, Jill M Johnsen⁵, Eimear E Kenny⁶, Barbara A Konkle⁷, Markku Laakso⁸, Ruth F J Loos⁹, Steven McCarroll¹⁰, Carlos Pato¹¹, Michele T Pato¹¹, Albert V Smith¹; NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium; Michael Boehnke¹, Laura J Scott¹, Christian Fuchsberger¹²

Affiliations

¹ Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, MI, USA.
² Institute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, Austria.
³ Department of Medicine and Department of Neurosciences, Université de Montréal, Montreal, QC, Canada; Research Centre, Montreal Heart Institute, Montreal, QC, Canada.
⁴ Department of Psychiatry, University of California San Diego, La Jolla, CA, USA.
⁵ Research Institute, Bloodworks, Seattle, WA, USA; Department of Medicine, University of Washington, Seattle, WA, USA.
⁶ Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁷ Department of Medicine, University of Washington, Seattle, WA, USA.
⁸ Institute of Clinical Medicine, Internal Medicine, University of Eastern Finland, Kuopio, Finland.
⁹ Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
¹⁰ Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Genetics, Harvard Medical School, Boston, MA, USA.
¹¹ Departments of Psychiatry, Rutgers University, Robert Wood Johnson Medical School and New Jersey Medical School, New Brunswick, NJ, USA.
¹² Institute for Biomedicine (Affiliated with the University of Lübeck), Eurac Research, Bolzano, Italy. Electronic address: [email protected].

Abstract

Understanding the genetic basis of human diseases and traits is dependent on the identification and accurate genotyping of genetic variants. Deep whole-genome sequencing (WGS), the gold standard technology for SNP and indel identification and genotyping, remains very expensive for most large studies. Here, we quantify the extent to which array genotyping followed by genotype imputation can approximate WGS in studies of individuals of African, Hispanic/Latino, and European ancestry in the US and of Finnish ancestry in Finland (a population isolate). For each study, we performed genotype imputation by using the genetic variants present on the Illumina Core, OmniExpress, MEGA, and Omni 2.5M arrays with the 1000G, HRC, and TOPMed imputation reference panels. Using the Omni 2.5M array and the TOPMed panel, ≥90% of bi-allelic single-nucleotide variants (SNVs) are well imputed (r² > 0.8) down to minor-allele frequencies (MAFs) of 0.14% in African, 0.11% in Hispanic/Latino, 0.35% in European, and 0.85% in Finnish ancestries. There was little difference in TOPMed-based imputation quality among the arrays with >700k variants. Individual-level imputation quality varied widely between and within the three US studies. Imputation quality also varied across genomic regions, producing regions where even common (MAF > 5%) variants were consistently not well imputed across ancestries. The extent to which array genotyping and imputation can approximate WGS therefore depends on reference panel, genotype array, sample ancestry, and genomic location. Imputation quality by variant or genomic region can be queried with our new tool, RsqBrowser, now deployed on the Michigan Imputation Server.

Keywords: genotype imputation; genotyping array; whole-genome sequencing.

MeSH terms

Gene Frequency / genetics
Genome-Wide Association Study
Genotype
High-Throughput Nucleotide Sequencing*
Humans
Polymorphism, Single Nucleotide* / genetics
Whole Genome Sequencing

Abstract

MeSH terms

Grants and funding