Motivation: In complex disorders, independently evolving locus pairs might interact to confer disease susceptibility, with only a modest effect at each locus. With genome-wide association studies on large cohorts, testing all pairs for interaction confers a heavy computational burden, and a loss of power due to large Bonferroni-like corrections. Correspondingly, limiting the tests to pairs that show marginal effect at either locus, also has reduced power. Here, we describe an algorithm that discovers interacting locus pairs without explicitly testing all pairs, or requiring a marginal effect at each locus. The central idea is a mathematical transformation that maps 'statistical correlation between locus pairs' to 'distance between two points in a Euclidean space'. This enables the use of geometric properties to identify proximal points (correlated locus pairs), without testing each pair explicitly. For large datasets (∼ 10(6) SNPs), this reduces the number of tests from 10(12) to 10(6), significantly reducing the computational burden, without loss of power. The speed of the test allows for correction using permutation-based tests. The algorithm is encoded in a tool called RAPID (RApid Pair IDentification) for identifying paired interactions in case-control GWAS.
Results: We validated RAPID with extensive tests on simulated and real datasets. On simulated models of interaction, RAPID easily identified pairs with small marginal effects. On the benchmark disease, datasets from The Wellcome Trust Case Control Consortium, RAPID ran in about 1 CPU-hour per dataset, and identified many significant interactions. In many cases, the interacting loci were known to be important for the disease, but were not individually associated in the genome-wide scan.
Availability: http://bix.ucsd.edu/projects/rapid.