Recommendations for choosing an analysis method that controls Type I error for unbalanced cluster sample designs with Gaussian outcomes

Jacqueline L Johnson; Sarah M Kreidler; Diane J Catellier; David M Murray; Keith E Muller; Deborah H Glueck

doi:10.1002/sim.6565

Recommendations for choosing an analysis method that controls Type I error for unbalanced cluster sample designs with Gaussian outcomes

Stat Med. 2015 Nov 30;34(27):3531-45. doi: 10.1002/sim.6565. Epub 2015 Jun 18.

Authors

Jacqueline L Johnson¹, Sarah M Kreidler², Diane J Catellier³, David M Murray⁴, Keith E Muller⁵, Deborah H Glueck⁶

Affiliations

¹ Department of Psychiatry, University of North Carolina, Chapel Hill, NC, U.S.A.
² Neptune and Company, Lakewood, CO, U.S.A.
³ RTI International, Research Triangle Park, NC, U.S.A.
⁴ Biostatistics and Bioinformatics BranchDivision of Epidemiology Statistics, and Prevention Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Rockville, MD, U.S.A.
⁵ Department of Health Outcomes and Policy, University of Florida, Gainesville, FL, U.S.A.
⁶ Department of Biostatistics and Informatics, University of Colorado Denver, Aurora, CO, U.S.A.

Abstract

We used theoretical and simulation-based approaches to study Type I error rates for one-stage and two-stage analytic methods for cluster-randomized designs. The one-stage approach uses the observed data as outcomes and accounts for within-cluster correlation using a general linear mixed model. The two-stage model uses the cluster specific means as the outcomes in a general linear univariate model. We demonstrate analytically that both one-stage and two-stage models achieve exact Type I error rates when cluster sizes are equal. With unbalanced data, an exact size α test does not exist, and Type I error inflation may occur. Via simulation, we compare the Type I error rates for four one-stage and six two-stage hypothesis testing approaches for unbalanced data. With unbalanced data, the two-stage model, weighted by the inverse of the estimated theoretical variance of the cluster means, and with variance constrained to be positive, provided the best Type I error control for studies having at least six clusters per arm. The one-stage model with Kenward-Roger degrees of freedom and unconstrained variance performed well for studies having at least 14 clusters per arm. The popular analytic method of using a one-stage model with denominator degrees of freedom appropriate for balanced data performed poorly for small sample sizes and low intracluster correlation. Because small sample sizes and low intracluster correlation are common features of cluster-randomized trials, the Kenward-Roger method is the preferred one-stage approach.

Keywords: Gaussian; Type I error; cluster randomized; group randomized; unbalanced.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Bias*
Cluster Analysis*
Normal Distribution
Quality Improvement* / statistics & numerical data
Randomized Controlled Trials as Topic / statistics & numerical data
Research Design / standards
Research Design / statistics & numerical data

Abstract

Publication types

MeSH terms

Grants and funding