Motivation: Multiple independently associated SNPs within a linkage disequilibrium region are a common phenomenon. Conditional analysis has been successful in identifying secondary signals. While conditional association tests are limited to specific genomic regions, they are benchmarked with genome-wide scale criterion, a conservative strategy. Within the weighted hypothesis testing framework, we developed a 'quasi-adaptive' method that uses the pairwise correlation (r2) and physical distance (d) from the index association to construct priority functions G =G(r2, d), which assign an SNP-specific α-threshold to each SNP. Family-wise error rate (FWER) and power of the approach were evaluated via simulations based on real GWAS data. We compared a series of different G-functions.
Results: Simulations under the null hypothesis on 1,100 primary SNPs confirmed appropriate empirical FWER for all G-functions. A G-function with optimal r2 = 0.3 between index and secondary SNP which down-weighted SNPs at higher distance step-wise-strong and gave more emphasis on d than on r2 had overall best power. It also gave the best results in application to the real datasets. As a proof of concept, 'quasi-adaptive' method was applied to GWAS on free thyroxine (FT4), inflammatory bowel disease (IBD) and human height. Application of the algorithm revealed 5 secondary signals in our example GWAS on FT4, 5 secondary signals in case of the IBD and 19 secondary signals on human height, that would have gone undetected with the established genome-wide threshold (α=5×10-8).
Availability and implementation: https://github.com/sghasemi64/Secondary-Signal.
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected].