Controlling for polygenic genetic confounding in epidemiologic association studies

Proc Natl Acad Sci U S A. 2024 Oct 29;121(44):e2408715121. doi: 10.1073/pnas.2408715121. Epub 2024 Oct 21.

Abstract

Epidemiologic associations estimated from observational data are often confounded by genetics due to pervasive pleiotropy among complex traits. Many studies either neglect genetic confounding altogether or rely on adjusting for polygenic scores (PGS) in regression analysis. In this study, we unveil that the commonly employed PGS approach is inadequate for removing genetic confounding due to measurement error and model misspecification. To tackle this challenge, we introduce PENGUIN, a principled framework for polygenic genetic confounding control based on variance component estimation. In addition, we present extensions of this approach that can estimate genetically unconfounded associations using GWAS summary statistics alone as input and between multiple generations of study samples. Through simulations, we demonstrate superior statistical properties of PENGUIN compared to the existing approaches. Applying our method to multiple population cohorts, we reveal and remove substantial genetic confounding in the associations of educational attainment with various complex traits and between parental and offspring education. Our results show that PENGUIN is an effective solution for genetic confounding control in observational data analysis with broad applications in future epidemiologic association studies.

Keywords: GWAS summary statistics; association study; genetic confounding.

MeSH terms

  • Confounding Factors, Epidemiologic
  • Genome-Wide Association Study* / methods
  • Humans
  • Models, Genetic
  • Multifactorial Inheritance* / genetics