Is population structure in the genetic biobank era irrelevant, a challenge, or an opportunity?

Daniel John Lawson; Neil Martin Davies; Simon Haworth; Bilal Ashraf; Laurence Howe; Andrew Crawford; Gibran Hemani; George Davey Smith; Nicholas John Timpson

doi:10.1007/s00439-019-02014-8

Is population structure in the genetic biobank era irrelevant, a challenge, or an opportunity?

Hum Genet. 2020 Jan;139(1):23-41. doi: 10.1007/s00439-019-02014-8. Epub 2019 Apr 27.

Authors

Daniel John Lawson¹, Neil Martin Davies², Simon Haworth², Bilal Ashraf², Laurence Howe³, Andrew Crawford², Gibran Hemani², George Davey Smith², Nicholas John Timpson²

Affiliations

¹ MRC Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, BS8 2BN, UK. [email protected].
² MRC Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, BS8 2BN, UK.
³ Institute of Cardiovascular Science, Faculty of Population Health Sciences, University College London, Gower Street, London, WC1E 6BT, UK.

Abstract

Replicable genetic association signals have consistently been found through genome-wide association studies in recent years. The recent dramatic expansion of study sizes improves power of estimation of effect sizes, genomic prediction, causal inference, and polygenic selection, but it simultaneously increases susceptibility of these methods to bias due to subtle population structure. Standard methods using genetic principal components to correct for structure might not always be appropriate and we use a simulation study to illustrate when correction might be ineffective for avoiding biases. New methods such as trans-ethnic modeling and chromosome painting allow for a richer understanding of the relationship between traits and population structure. We illustrate the arguments using real examples (stroke and educational attainment) and provide a more nuanced understanding of population structure, which is set to be revisited as a critical aspect of future analyses in genetic epidemiology. We also make simple recommendations for how problems can be avoided in the future. Our results have particular importance for the implementation of GWAS meta-analysis, for prediction of traits, and for causal inference.

Publication types

Review

MeSH terms

Algorithms*
Biological Specimen Banks / statistics & numerical data*
Genetics, Population*
Genome-Wide Association Study*
Humans
Multifactorial Inheritance*
Polymorphism, Single Nucleotide

Abstract

Publication types

MeSH terms

Grants and funding