Recent studies have indicated that linkage disequilibrium (LD) between single nucleotide polymorphism (SNP) markers can be used to derive a reduced set of tagging SNPs (tSNPs) for genetic association studies. Previous strategies for identifying tSNPs have focused on LD measures or haplotype diversity, but the statistical power to detect disease-associated variants using tSNPs in genetic studies has not been fully characterized. We propose a new approach of selecting tSNPs based on determining the set of SNPs with the highest power to detect association. Two-locus genotype frequencies are used in the power calculations. To show utility, we applied this power method to a large number of SNPs that had been genotyped in Caucasian samples. We demonstrate that a significant reduction in genotyping efforts can be achieved although the reduction depends on genotypic relative risk, inheritance mode and the prevalence of disease in the human population. The tSNP sets identified by our method are remarkably robust to changes in the disease model when small relative risk and additive mode of inheritance are employed. We have also evaluated the ability of the method to detect unidentified SNPs. Our findings have important implications in applying tSNPs from different data sources in association studies.
Copyright 2004 S. Karger AG, Basel