Considering strategies for SNP selection in genetic and polygenic risk scores

Front Genet. 2022 Oct 25:13:900595. doi: 10.3389/fgene.2022.900595. eCollection 2022.

Abstract

Genetic risk scores (GRS) and polygenic risk scores (PRS) are weighted sums of, respectively, several or many genetic variant indicator variables. Although they are being increasingly proposed for clinical use, the best ways to construct them are still actively debated. In this commentary, we present several case studies illustrating practical challenges associated with building or attempting to improve score performance when there is expected to be heterogeneity of disease risk between cohorts or between subgroups of individuals. Specifically, we contrast performance associated with several ways of selecting single nucleotide polymorphisms (SNPs) for inclusion in these scores. By considering GRS and PRS as predictors that are measured with error, insights into their strengths and weaknesses may be obtained, and SNP selection approaches play an important role in defining such errors.

Keywords: feature selection; high-dimensional data; instrumental variable methods; measurement error; mendelian randomization; polygenic risk scores; regularized models.