There has been considerable debate in the literature concerning bias in case-control association mapping studies due to population stratification. In this paper, we perform a theoretical analysis of the effects of population stratification by measuring the inflation in the test's type I error (or false-positive rate). Using a model of stratified sampling, we derive an exact expression for the type I error as a function of population parameters and sample size. We give necessary and sufficient conditions for the bias to vanish when there is no statistical association between disease and marker genotype in each of the subpopulations making up the total population. We also investigate the variation of bias with increasing subpopulations and show, both theoretically and by using simulations, that the bias can sometimes be quite substantial even with a very large number of subpopulations. In a companion simulation-based paper (Heiman et al., Part I, this issue), we have focused on the CRR (confounding risk ratio) and its relationship to the type I error in the case of two subpopulations, and have also quantified the magnitude of the type I error that can occur with relatively low CRR values.
2004 S. Karger AG, Basel.