Population-aware permutation-based significance thresholds for genome-wide association studies

Maura John; Arthur Korte; Marco Todesco; Dominik G Grimm

doi:10.1093/bioadv/vbae168

Population-aware permutation-based significance thresholds for genome-wide association studies

Bioinform Adv. 2024 Oct 28;4(1):vbae168. doi: 10.1093/bioadv/vbae168. eCollection 2024.

Authors

Maura John^{1

2}, Arthur Korte³, Marco Todesco^{4

5

6}, Dominik G Grimm^{1

2

7}

Affiliations

¹ Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability, Bioinformatics, 94315 Straubing, Germany.
² Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, 94315 Straubing, Germany.
³ Faculty of Biology, University of Würzburg, 97074 Würzburg, Germany.
⁴ Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4, Canada.
⁵ Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, BC V6T 1Z4, Canada.
⁶ Department of Biology, University of British Columbia, Kelowna, BC V1V 1V7, Canada.
⁷ Technical University of Munich, TUM School of Computation, Information and Technology, 85748 Garching, Germany.

Abstract

Motivation: Permutation-based significance thresholds have been shown to be a robust alternative to classical Bonferroni significance thresholds in genome-wide association studies (GWAS) for skewed phenotype distributions. The recently published method permGWAS introduced a batch-wise approach to efficiently compute permutation-based GWAS. However, running multiple univariate tests in parallel leads to many repetitive computations and increased computational resources. More importantly, traditional permutation methods that permute only the phenotype break the underlying population structure.

Results: We propose permGWAS2, an improved method that does not break the population structure during permutations and uses an elegant block matrix decomposition to optimize computations, thereby reducing redundancies. We show on synthetic data that this improved approach yields a lower false discovery rate for skewed phenotype distributions compared to the previous version and the commonly used Bonferroni correction. In addition, we re-analyze a dataset covering phenotypic variation in 86 traits in a population of 615 wild sunflowers (Helianthus annuus L.). This led to the identification of dozens of novel associations with putatively adaptive traits, and removed several likely false-positive associations with limited biological support.

Availability and implementation: permGWAS2 is open-source and publicly available on GitHub for download: https://github.com/grimmlab/permGWAS.