We propose a cost-effective two-stage approach to investigate gene-disease associations when testing a large number of candidate markers using a case-control design. Under this approach, all the markers are genotyped and tested at stage 1 using a subset of affected cases and unaffected controls, and the most promising markers are genotyped on the remaining individuals and tested using all the individuals at stage 2. The sample size at stage 1 is chosen such that the power to detect the true markers of association is 1-beta(1) at significance level alpha(1). The most promising markers are tested at significance level alpha(2) at stage 2. In contrast, a one-stage approach would evaluate and test all the markers on all the cases and controls to identify the markers significantly associated with the disease. The goal is to determine the two-stage parameters (alpha(1), beta(1), alpha(2)) that minimize the cost of the study such that the desired overall significance is alpha and the desired power is close to 1-beta, the power of the one-stage approach. We provide analytic formulae to estimate the two-stage parameters. The properties of the two-stage approach are evaluated under various parametric configurations and compared with those of the corresponding one-stage approach. The optimal two-stage procedure does not depend on the signal of the markers associated with the study. Further, when there is a large number of markers, the optimal procedure is not substantially influenced by the total number of markers associated with the disease. The results show that, compared to a one-stage approach, a two-stage procedure typically halves the cost of the study.
Copyright 2003 Wiley-Liss, Inc.