Background: Population-based cancer survival estimates can provide insight into the real-world impacts of healthcare interventions and preventive services. However, estimation of survival rates obtained from population-based cancer registries can be biased due to missed incidence or incomplete vital status data. Long-term survival estimates in particular are prone to overestimation, since the proportion of deaths that are missed, for example through unregistered emigration, increases with follow-up time. This also applies to registry-based long-term prevalence estimates. The aim of this report is to introduce a method to detect missed deaths within cancer registry data such that long-term survival of cancer patients does not exceed survival in the general population.
Methods: We analyzed data from 15 German epidemiologic cancer registries covering the years 1970-2016 and from Surveillance, Epidemiology, and End Results (SEER)-18 registries covering 1975-2015. The method is based on comparing survival times until exit (death or follow-up end) and ages at exit between deceased patients and surviving patients, stratified by diagnosis group, sex, age group and stage. Deceased patients with both follow-up time and age at exit in the highest percentile were regarded as outliers and used to fit a logistic regression. The regression was then used to classify each surviving patient as a survivor or a missed death. The procedure was repeated for lower percentile thresholds regarding deceased persons until long-term survival rates no longer exceeded the survival rates in the general population.
Results: For the German cancer registry data, 0.9% of total deaths were classified as having been missed. Excluding these missed deaths reduced 20-year relative survival estimates for all cancers combined from 140% to 51%. For the whites in SEER data, classified missed deaths amounted to 0.02% of total deaths, resulting in 0.4 percent points lower 20-year relative survival rate for all cancers combined.
Conclusion: The method described here classified a relatively small proportion of missed deaths yet reduced long-term survival estimates to more plausible levels. The effects of missed deaths should be considered when calculating long-term survival or prevalence estimates.
Keywords: cancer registry data; classification algorithm; long-term survival; missed deaths; relative survival.
Copyright © 2023 Dahm, Barnes and Kraywinkel.