Background: Net survival is the survival probability we would observe if the disease under study were the only cause of death. When estimated from routinely collected population-based cancer registry data, this indicator is a key metric for cancer control. Unfortunately, such data typically contain a non-negligible proportion of missing values on important prognostic factors (eg, tumor stage).
Methods: We carried out an empirical study to compare the performance of complete records analysis and several multiple imputation strategies when net survival is estimated via a flexible parametric proportional hazards model that includes stage, a partially observed categorical covariate. Starting from fully observed cancer registry data, we induced missingness on stage under three scenarios. For each of these scenarios, we simulated 100 incomplete datasets and evaluated the performance of the different strategies.
Results: Ordinal logistic models are not suitable for the imputation of tumor stage. Complete records analysis may lead to grossly misleading estimates of net survival, even when the missing data mechanism is conditionally independent of survival time given the covariates and the bias on the excess hazard ratios estimates is negligible.
Conclusions: As key covariates are unlikely missing completely at random, studies estimating net survival should not use complete records. When the missingness can be inferred from available data, appropriate multiple imputation should be performed. In the context of flexible parametric proportional hazards models with a partially observed stage covariate, a multinomial logistic imputation model for stage should be used and should include the Nelson-Aalen cumulative hazard estimate and the event indicator.