Objective: When numerous single nucleotide polymorphisms (SNPs) have been identified in a candidate gene, a relevant and still unanswered question is to determine how many and which of these SNPs should be optimally tested to detect an association with the disease. Testing them all is expensive and often unnecessary. Alleles at different SNPs may be associated in the population because of the existence of linkage disequilibrium, so that knowing the alleles carried at one SNP could provide exact or partial knowledge of alleles carried at a second SNP. We present here a method to select the most appropriate subset of SNPs in a candidate gene based on the pairwise linkage disequilibrium between the different SNPs.
Method: The best subset is identified through power computations performed under different genetic models, assuming that one of the SNPs identified is the disease susceptibility variant.
Results: We applied the method on two data sets, an empirical study of the APOE gene region and a simulated study concerning one of the major genes (MG1) from the Genetic Analysis Workshop 12. For these two genes, the sets of SNPs selected were compared to the ones obtained using two other methods that need the reconstruction of multilocus haplotypes in order to identify haplotype-tag SNPs (htSNPs). We showed that with both data sets, our method performed better than the other selection methods.
Copyright 2003 S. Karger AG, Basel