Evaluating the impact of stratification on the power and cross-arm balance of randomized phase 2 clinical trials

Anna Moseley; Michael LeBlanc; Boris Freidlin; Rory M Shallis; Amer M Zeidan; David A Sallman; Harry P Erba; Richard F Little; Megan Othus

doi:10.1177/17407745241304065

Evaluating the impact of stratification on the power and cross-arm balance of randomized phase 2 clinical trials

Clin Trials. 2025 Jan 15:17407745241304065. doi: 10.1177/17407745241304065. Online ahead of print.

Authors

Anna Moseley¹, Michael LeBlanc¹, Boris Freidlin², Rory M Shallis³, Amer M Zeidan³, David A Sallman⁴, Harry P Erba⁵, Richard F Little⁶, Megan Othus¹

Affiliations

¹ Public Health Sciences Division and SWOG Statistics and Data Management Center, Fred Hutchinson Cancer Center, Seattle, WA, USA.
² Division of Cancer Treatment and Diagnosis, National Cancer Institute, Bethesda, MD, USA.
³ Department of Internal Medicine, Yale School of Medicine and Yale Cancer Center, New Haven, CT, USA.
⁴ Malignant Hematology Program, Moffitt Cancer Center, Tampa, FL, USA.
⁵ Leukemia Program, Duke University Medical Center, Durham, NC, USA.
⁶ National Cancer Institute, Bethesda, MD, USA.

PMID: 39815460
DOI: 10.1177/17407745241304065

Abstract

Background/aims: Randomized clinical trials often use stratification to ensure balance between arms. Analysis of primary endpoints of these trials typically uses a "stratified analysis," in which analyses are performed separately in each subgroup defined by the stratification factors, and those separate analyses are weighted and combined. In the phase 3 setting, stratified analyses based on a small number of stratification factors can provide a small increase in power. The impact on power and type-1 error of stratification in the setting of smaller sample sizes as in randomized phase 2 trials has not been well characterized.

Methods: We performed computational studies to characterize the power and cross-arm balance of modestly sized clinical trials (less than 170 patients) with varying numbers of stratification factors (0-6), sample sizes, randomization ratios (1:1 vs 2:1), and randomization methods (dynamic balancing vs stratified block).

Results: We found that the power of unstratified analyses was minimally impacted by the number of stratification factors used in randomization. Analyses stratified by 1-3 factors maintained power over 80%, while power dropped below 80% when four or more stratification factors were used. These trends held regardless of sample size, randomization ratio, and randomization method. For a given randomization ratio and sample size, increasing the number of factors used in randomization had an adverse impact on cross-arm balance. Stratified block randomization performed worse than dynamic balancing with respect to cross-arm balance when three or more stratification factors were used.

Conclusion: Stratified analyses can decrease power in the setting of phase 2 trials when the number of patients in a stratification subgroup is small.

Keywords: Stratified randomization; block randomization; dynamic balancing; power; stratification; stratified analysis.