Two-Stage sampling designs for gene association studies

Genet Epidemiol. 2004 Dec;27(4):401-14. doi: 10.1002/gepi.20047.

Abstract

We consider two-stage case-control designs for testing associations between single nucleotide polymorphisms (SNPs) and disease, in which a subsample of subjects is used to select a panel of "tagging" SNPs that will be considered in the main study. We propose a pseudolikelihood [Pepe and Flemming, 1991: JASA 86:108-113] that combines the information from both the main study and the substudy to test the association with any polymorphism in the original set. SNP-tagging [Chapman et al., 2003: Hum Hered 56:18-31] and haplotype-tagging [Stram et al., 2003a; Hum Hered 55:27-36] approaches are compared. We show that the cost-efficiency of such a design for estimating the relative risk associated with the causal polymorphism can be considerably better than for a single-stage design, even if the causal polymorphism is not included in the tag-SNP set. We also consider the optimal selection of cases and controls in such designs and the relative efficiency for estimating the location of a causal variant in linkage disequilibrium mapping. Nevertheless, as the cost of high-volume genotyping plummets and haplotype tagging information from the International HapMap project [Gibbs et al., 2003; Nature 426:789-796] rapidly accumulates in public databases, such two-stage designs may soon become unnecessary.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Case-Control Studies
  • Chromosome Mapping / methods*
  • Genetic Predisposition to Disease / epidemiology*
  • Genetics, Population*
  • Genome, Human
  • Genotype
  • Haplotypes*
  • Humans
  • Linkage Disequilibrium
  • Models, Genetic*
  • Polymorphism, Single Nucleotide / genetics*
  • Research Design