Background: The logrank test and the Cox proportional hazards model are routinely applied in the design and analysis of randomised controlled trials (RCTs) with time-to-event outcomes. Usually, sample size and power calculations assume proportional hazards (PH) of the treatment effect, i.e. the hazard ratio is constant over the entire follow-up period. If the PH assumption fails, the power of the logrank/Cox test may be reduced, sometimes severely. It is, therefore, important to understand how serious this can become in real trials, and for a proven, alternative test to be available to increase the robustness of the primary test.
Methods: We performed a systematic search to identify relevant articles in four leading medical journals that publish results of phase 3 clinical trials. Altogether, 50 articles satisfied our inclusion criteria. We digitised published Kaplan-Meier curves and created approximations to the original times to event or censoring at the individual patient level. Using the reconstructed data, we tested for non-PH in all 50 trials. We compared the results from the logrank/Cox test with those from the combined test recently proposed by Royston and Parmar.
Results: The PH assumption was checked and reported only in 28% of the studies. Evidence of non-PH at the 0.10 level was detected in 31% of comparisons. The Cox test of the treatment effect was significant at the 0.05 level in 49% of comparisons, and the combined test in 55%. In four of five trials with discordant results, the interpretation would have changed had the combined test been used. The degree of non-PH and the dominance of the p value for the combined test were strongly associated. Graphical investigation suggested that non-PH was mostly due to a treatment effect manifesting in an early follow-up and disappearing later.
Conclusions: The evidence for non-PH is checked (and, hence, identified) in only a small minority of RCTs, but non-PH may be present in a substantial fraction of such trials. In our reanalysis of the reconstructed data from 50 trials, the combined test outperformed the Cox test overall. The combined test is a promising approach to making trial design and analysis more robust.
Keywords: Combined test; Cox test; Hazard ratio; Logrank test; Non-proportional hazards; Randomised controlled trials; Robustness; Time-to-event outcome.