Objectives: Genetic association studies are usually based upon restricted sets of 'tag' markers selected to represent the total sequence variation. Tag selection is often determined by some threshold for the r(2) coefficients of linkage disequilibrium (LD) between tag and untyped markers, it being widely assumed that power to detect an effect at the untyped sites is retained by typing the tag marker in a sample scaled by the inverse of the selected threshold (1/r(2)). However, unless only a single causal variant occurs at a locus, it has been shown [Eur J Hum Genet 2006;14:426-437] that significant power loss can occur if this principle is applied. We sought to investigate whether unexpected loss of power might be an exceptional case or more general concern. In the absence of detailed knowledge about the genetic architecture at complex disease loci, we developed a mathematical approach to test all possible situations.
Methods: We derived mathematical formulae allowing the calculation of all possible odds ratios (OR) at a tag marker locus given the effect size that would be observed by typing a second locus and the r(2) between the two loci. For a range of allele frequencies, r(2) between loci, and strengths of association at the causal locus (OR from 0.5 to 2) that we consider realistic for complex disease loci, we next determined the sample sizes that would be necessary to give equivalent power to detect association by genotyping tag and causal loci and compared these with the sample sizes predicted by applying 1/r(2).
Results: Under most of the hypothetical scenarios we examined, the calculated sample sizes required to maintain power by typing markers that tag the causal locus at even moderately high r(2) (0.8) were greater than that calculated by applying 1/r(2). Even in populations with apparently similar measurements of allele frequency, LD structure, and effect size at the susceptibility allele, the required sample size to detect association with a tag marker can vary substantially. We also show that in apparently similar populations, associations to either allele at the tag site are possible.
Conclusions: Indirect tests of association are less powered than sizes predicted by applying 1/r(2) in the majority of hypothetical scenarios we examined. Our findings pertain even for what we consider likely to be larger than average effect sizes in complex diseases (OR = 1.5-2) and even for moderately high r(2) values between the markers. Until a substantial number of disease genes have been identified through methods that are not based on tagging, and therefore biased towards those situations most favourable to tagging, it is impossible to know how the true scenarios are distributed across the range of possible scenarios. Nevertheless, while association designs based upon tag marker selection by necessity are the tool of choice for de novo gene discovery, our data suggest power to initially detect association may often be less than assumed. Moreover, our data suggest that to avoid genuine findings being subsequently discarded by unpredictable losses of power, follow up studies in other samples should be based upon more detailed analyses of the gene rather than simply on the tag SNPs showing association in the discovery study.
(c) 2007 S. Karger AG, Basel