Cross-population enhancement of PrediXcan predictions with a gnomAD-based east Asian reference framework

Brief Bioinform. 2024 Sep 23;25(6):bbae549. doi: 10.1093/bib/bbae549.

Abstract

Over the past decade, genome-wide association studies have identified thousands of variants significantly associated with complex traits. For each locus, gene expression levels are needed to further explore its biological functions. To address this, the PrediXcan algorithm leverages large-scale reference data to impute the gene expression level from single nucleotide polymorphisms, and thus the gene-trait associations can be tested to identify the candidate causal genes. However, a challenge arises due to the fact that most reference data are from subjects of European ancestry, and the accuracy and robustness of predicted gene expression in subjects of East Asian (EAS) ancestry remains unclear. Here, we first simulated a variety of scenarios to explore the impact of the level of population diversity on gene expression. Population differentiated variants were estimated by using the allele frequency information from The Genome Aggregation Database. We found that the weights of a variants was the main factor that affected the gene expression predictions, and that ~70% of variants were significantly population differentiated based on proportion tests. To provide insights into this population effect on gene expression levels, we utilized the allele frequency information to develop a gene expression reference panel, Predict Asian-Population (PredictAP), for EAS ancestry. PredictAP can be viewed as an auxiliary tool for PrediXcan when using genotype data from EAS subjects.

Keywords: PrediXcan; population diversity; single-nucleotide polymorphism; transcriptome-wide association studies (TWAS).

MeSH terms

  • Algorithms*
  • Asia, Eastern
  • Databases, Genetic
  • East Asian People* / genetics
  • Gene Frequency*
  • Genetics, Population
  • Genome-Wide Association Study* / methods
  • Humans
  • Polymorphism, Single Nucleotide*