Interpreting SNP heritability in admixed populations

bioRxiv [Preprint]. 2024 Aug 6:2023.08.04.551959. doi: 10.1101/2023.08.04.551959.

Abstract

SNP heritability h s n p 2 is defined as the proportion of phenotypic variance explained by genotyped SNPs and is believed to be a lower bound of heritability ( h 2 ), being equal to it if all causal variants are known. Despite the simple intuition behind h s n p 2 , its interpretation and equivalence to h 2 is unclear, particularly in the presence of population structure and assortative mating. It is well known that population structure can lead to inflation in h ˆ s n p 2 estimates because of confounding due to linkage disequilibrium (LD) or shared environment. Here we use analytical theory and simulations to demonstrate that h s n p 2 estimates can be biased in admixed populations, even in the absence of confounding and even if all causal variants are known. This is because admixture generates LD, which contributes to the genetic variance, and therefore to heritability. Genome-wide restricted maximum likelihood (GREML) does not capture this contribution leading to under- or over-estimates of h s n p 2 relative to h 2 , depending on the genetic architecture. In contrast, Haseman-Elston (HE) regression exaggerates the LD contribution leading to biases in the opposite direction. For the same reason, GREML and HE estimates of local ancestry heritability h γ 2 are also biased. We describe this bias in h ˆ s n p 2 and h ˆ γ 2 as a function of admixture history and the genetic architecture of the trait and show that it can be recovered under some conditions. We clarify the interpretation of h ˆ s n p 2 in admixed populations and discuss its implication for genome-wide association studies and polygenic prediction.

Publication types

  • Preprint