Allele frequency impacts the cross-ancestry portability of gene expression prediction in lymphoblastoid cell lines

Am J Hum Genet. 2024 Nov 12:S0002-9297(24)00378-1. doi: 10.1016/j.ajhg.2024.10.009. Online ahead of print.

Abstract

Population-level genetic studies are overwhelmingly biased toward European ancestries. Transferring genetic predictions from European ancestries to other ancestries results in a substantial loss of accuracy. Yet, it remains unclear how much various genetic factors, such as causal effect differences, linkage disequilibrium (LD) differences, or allele frequency differences, contribute to the loss of prediction accuracy across ancestries. In this study, we used gene expression levels in lymphoblastoid cell lines to understand how much each genetic factor contributes to lowered portability of gene expression prediction from European to African ancestries. We found that cis-genetic effects on gene expression are highly similar between European and African individuals. However, we found that allele frequency differences of causal variants have a striking impact on prediction portability. For example, portability is reduced by more than 32% when the causal cis-variant is common (minor allele frequency, MAF >5%) in European samples (training population) but is rarer (MAF <5%) in African samples (prediction population). While large allele frequency differences can decrease portability through increasing LD differences, we also determined that causal allele frequency can significantly impact portability when the impact from LD is substantially controlled. This observation suggests that improving statistical fine-mapping alone does not overcome the loss of portability resulting from differences in causal allele frequency. We conclude that causal cis-eQTL effects are highly similar in European and African individuals, and allele frequency differences have a large impact on the accuracy of gene expression prediction.

Keywords: eQTL; expression quantitative trait loci; gene expression; portability; statistical genetics; transcriptome.