Exact sample size needed to detect dependence in 2 x 2 x 2 tables

Biometrics. 2007 Dec;63(4):1245-52. doi: 10.1111/j.1541-0420.2007.00801.x.

Abstract

In the Georgia Centenarian Study (Poon et al., Exceptional Longevity, 2006), centenarian cases and young controls are classified according to three categories (age, ethnic origin, and single nucleotide polymorphisms [SNPs] of candidate longevity genes), where each factor has two possible levels. Here we provide methodologies to determine the minimum sample size needed to detect dependence in 2 x 2 x 2 tables based on Fisher's exact test evaluated exactly or by Markov chain Monte Carlo (MCMC), assuming only the case total L and the control total N are known. While our MCMC method uses serial computing, parallel computing techniques are employed to solve the exact sample size problem. These tools will allow researchers to design efficient sampling strategies and to select informative SNPs. We apply our tools to 2 x 2 x 2 tables obtained from a pilot study of the Georgia Centenarians Study, and the sample size results provided important information for the subsequent major study. A comparison between the results of an exact method and those of a MCMC method showed that the MCMC method studied needed much less computation time on average (10.16 times faster on average for situations examined with S.E. = 2.60), but its sample size results were only valid as a rule for larger sample sizes (in the hundreds).

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Age Distribution
  • Aging / genetics*
  • Computer Simulation
  • Data Interpretation, Statistical*
  • Georgia / epidemiology
  • Humans
  • Longevity / genetics*
  • Markov Chains
  • Models, Genetic*
  • Models, Statistical
  • Polymorphism, Single Nucleotide / genetics*
  • Sample Size*
  • Sex Distribution