Is population structure in the genetic biobank era irrelevant, a challenge, or an opportunity?

Hum Genet. 2020 Jan;139(1):23-41. doi: 10.1007/s00439-019-02014-8. Epub 2019 Apr 27.

Abstract

Replicable genetic association signals have consistently been found through genome-wide association studies in recent years. The recent dramatic expansion of study sizes improves power of estimation of effect sizes, genomic prediction, causal inference, and polygenic selection, but it simultaneously increases susceptibility of these methods to bias due to subtle population structure. Standard methods using genetic principal components to correct for structure might not always be appropriate and we use a simulation study to illustrate when correction might be ineffective for avoiding biases. New methods such as trans-ethnic modeling and chromosome painting allow for a richer understanding of the relationship between traits and population structure. We illustrate the arguments using real examples (stroke and educational attainment) and provide a more nuanced understanding of population structure, which is set to be revisited as a critical aspect of future analyses in genetic epidemiology. We also make simple recommendations for how problems can be avoided in the future. Our results have particular importance for the implementation of GWAS meta-analysis, for prediction of traits, and for causal inference.

Publication types

  • Review

MeSH terms

  • Algorithms*
  • Biological Specimen Banks / statistics & numerical data*
  • Genetics, Population*
  • Genome-Wide Association Study*
  • Humans
  • Multifactorial Inheritance*
  • Polymorphism, Single Nucleotide