Multiple imputation to minimise bias from missing stage information in estimates of early cancer diagnosis in England: a population-based study

Cancer Epidemiol. 2022 Aug:79:102198. doi: 10.1016/j.canep.2022.102198. Epub 2022 Jun 17.

Abstract

Introduction: Monitoring early diagnosis is a priority of cancer policy in England. Information on stage has not always been available for a large proportion of patients, however, which may bias temporal comparisons. We previously estimated that early-stage diagnosis of colorectal cancer rose from 32% to 44% during 2008-2013, using multiple imputation. Here we examine the underlying assumptions of multiple imputation for missing stage using the same dataset.

Methods: Individually-linked cancer registration, Hospital Episode Statistics (HES), and audit data were examined. Six imputation models including different interaction terms, post-diagnosis treatment, and survival information were assessed, and comparisons drawn with the a priori optimal model. Models were further tested by setting stage values to missing for some patients under one plausible mechanism, then comparing actual and imputed stage distributions for these patients. Finally, a pattern-mixture sensitivity analysis was conducted.

Results: Data from 196,511 colorectal patients were analysed, with 39.2% missing stage. Inclusion of survival time increased the accuracy of imputation: the odds ratio for change in early-stage diagnosis during 2008-2013 was 1.7 (95% CI: 1.6, 1.7) with survival to 1 year included, compared to 1.9 (95% CI 1.9-2.0) with no survival information. Imputation estimates of stage were accurate in one plausible simulation. Pattern-mixture analyses indicated our previous analysis conclusions would only change materially if stage were misclassified for 20% of the patients who had it categorised as late.

Interpretation: Multiple imputation models can substantially reduce bias from missing stage, but data on patient's one-year survival should be included for highest accuracy.

Keywords: Early diagnosis; Missing data; Missing stage; Multiple imputation; Pattern mixture; Population-based; Routine data; Sensitivity analysis; Stage at diagnosis; Survival; Temporal changes; Time trends.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bias
  • Data Collection
  • Early Detection of Cancer*
  • Humans
  • Neoplasms* / diagnosis
  • Neoplasms* / epidemiology
  • Odds Ratio