Performance of analytical methods for overdispersed counts in cluster randomized trials: sample size, degree of clustering and imbalance

Stat Med. 2009 Oct 30;28(24):2989-3011. doi: 10.1002/sim.3681.

Abstract

Many different methods have been proposed for the analysis of cluster randomized trials (CRTs) over the last 30 years. However, the evaluation of methods on overdispersed count data has been based mostly on the comparison of results using empiric data; i.e. when the true model parameters are not known. In this study, we assess via simulation the performance of five methods for the analysis of counts in situations similar to real community-intervention trials. We used the negative binomial distribution to simulate overdispersed counts of CRTs with two study arms, allowing the period of time under observation to vary among individuals. We assessed different sample sizes, degrees of clustering and degrees of cluster-size imbalance. The compared methods are: (i) the two-sample t-test of cluster-level rates, (ii) generalized estimating equations (GEE) with empirical covariance estimators, (iii) GEE with model-based covariance estimators, (iv) generalized linear mixed models (GLMM) and (v) Bayesian hierarchical models (Bayes-HM). Variation in sample size and clustering led to differences between the methods in terms of coverage, significance, power and random-effects estimation. GLMM and Bayes-HM performed better in general with Bayes-HM producing less dispersed results for random-effects estimates although upward biased when clustering was low. GEE showed higher power but anticonservative coverage and elevated type I error rates. Imbalance affected the overall performance of the cluster-level t-test and the GEE's coverage in small samples. Important effects arising from accounting for overdispersion are illustrated through the analysis of a community-intervention trial on Solar Water Disinfection in rural Bolivia.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Analysis of Variance
  • Bayes Theorem
  • Bias
  • Binomial Distribution
  • Biostatistics*
  • Bolivia / epidemiology
  • Child, Preschool
  • Computer Simulation
  • Confidence Intervals
  • Diarrhea / epidemiology
  • Diarrhea / prevention & control
  • Disinfection
  • Female
  • Humans
  • Likelihood Functions
  • Linear Models
  • Male
  • Markov Chains
  • Models, Statistical*
  • Monte Carlo Method
  • Poisson Distribution
  • Randomized Controlled Trials as Topic / methods*
  • Sample Size
  • Solar Energy
  • Treatment Outcome
  • Water Purification