Errors and linkage disequilibrium interact multiplicatively when computing sample sizes for genetic case-control association studies

Pac Symp Biocomput. 2003:490-501. doi: 10.1142/9789812776303_0046.

Abstract

Single nucleotide polymorphisms (SNP) may be used in case-control designs to test for association between a SNP marker and a disease. Such designs may assume that the genotype data are reported without error. Our goal is quantifying the effects that errors have on sample size for case-control studies with haplotypes formed by a disease locus and a SNP marker locus in the presence of linkage disequilibrium (LD). We consider the effects of a recently published error model on 2x3 chi-square analysis. We study the joint relation of LD and errors with sample size for three specific genetic disease models and two settings each of marker allele frequencies (total of 6 studies). Minimal sample size necessary for fixed asymptotic power is estimated as a 4th degree polynomial in the variables S (error) and D' (LD measure) via a backward step-wise regression. We find that increased error rates lower power. In all studies, we observe that LD and errors interact in a non-linear fashion. In particular, regression analyses shows that several higher order interaction terms have coefficients significantly different from 0 in each study, with fraction of variance explained greater than 0.9999. Finally, the increase in sample size necessary to maintain constant asymptotic power and level of significance as a function of S is smallest when D' = 1 (perfect LD). The increase grows monotonically as D' decreases to 0.5 for all studies.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Case-Control Studies*
  • Computational Biology
  • Gene Frequency
  • Genotype
  • Haplotypes
  • Humans
  • Linkage Disequilibrium*
  • Models, Genetic
  • Polymorphism, Single Nucleotide*
  • Regression Analysis
  • Sample Size