Comparing Ancestry Standardization Approaches for a Transancestry Colorectal Cancer Polygenic Risk Score

Elisabeth A Rosenthal; Li Hsu; Minta Thomas; Ulrike Peters; Christopher Kachulis; Karynne Patterson; Gail P Jarvik

doi:10.1002/gepi.22590

Comparing Ancestry Standardization Approaches for a Transancestry Colorectal Cancer Polygenic Risk Score

Genet Epidemiol. 2025 Jan;49(1):e22590. doi: 10.1002/gepi.22590. Epub 2024 Sep 24.

Authors

Elisabeth A Rosenthal¹, Li Hsu², Minta Thomas², Ulrike Peters², Christopher Kachulis³, Karynne Patterson⁴, Gail P Jarvik^{1

4}

Affiliations

¹ Division Medical Genetics, School of Medicine, University of Washington, Seattle, Washington, USA.
² Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.
³ Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA.
⁴ Department of Genome Sciences, University of Washington, Seattle, Washington, USA.

PMID: 39315597
DOI: 10.1002/gepi.22590

Abstract

Colorectal cancer (CRC) is a complex disease with monogenic, polygenic and environmental risk factors. Polygenic risk scores (PRSs) aim to identify high polygenic risk individuals. Due to differences in genetic background, PRS distributions vary by ancestry, necessitating standardization. We compared four post-hoc methods using the All of Us Research Program Whole Genome Sequence data for a transancestry CRC PRS. We contrasted results from linear models trained on A. the entire data or an ancestrally diverse subset AND B. covariates including principal components of ancestry or admixture. Standardization with the training subset also adjusted the variance. All methods performed similarly within ancestry, OR (95% C.I.) per s.d. change in PRS: African 1.5 (1.02, 2.08), Admixed American 2.2 (1.27, 3.85), European 1.6 (1.43, 1.89), and Middle Eastern 1.1 (0.71, 1.63). Using admixture and an ancestrally diverse training set provided distributions closest to standard Normal. Training a model on ancestrally diverse participants, adjusting both the mean and variance using admixture as covariates, created standard Normal z-scores, which can be used to identify patients at high polygenic risk. These scores can be incorporated into comprehensive risk calculation including other known risk factors, allowing for more precise risk estimates.

Keywords: admixture; all of us; colorectal cancer; polygenic risk score; transancestry.

Publication types

Comparative Study

MeSH terms

Colorectal Neoplasms* / genetics
Ethnicity / genetics
Female
Genetic Risk Score*
Genome-Wide Association Study / standards
Humans
Male
Middle Aged
Polymorphism, Single Nucleotide
Racial Groups / genetics

Grants and funding

This work was funded by the Office of the Director at the National Institute of Health, under award notice 1OT2OD002748-01 and by the NHGRI through the grant U01HG008657.