Background: Calculating accurate estimates of cancer survival is important for various analyses of cancer patient care and prognosis. Current US survival rates are estimated based on data from the National Cancer Institute's (NCI's) Surveillance, Epidemiology, and End RESULTS (SEER) program, covering approximately 28 percent of the US population. The National Program of Cancer Registries (NPCR) covers about 96 percent of the US population. Using a population-based database with greater US population coverage to calculate survival rates at the national, state, and regional levels can further enhance the effective monitoring of cancer patient care and prognosis in the United States. The first step is to establish the coding completeness and coding quality of the NPCR data needed for calculating survival rates and conducting related validation analyses.
Methods: Using data from the NPCR-Cancer Surveillance System (CSS) from 1995 through 2008, we assessed coding completeness and quality on 26 data elements that are needed to calculate cancer relative survival estimates and conduct related analyses. Data elements evaluated consisted of demographic, follow-up, prognostic, and cancer identification variables. Analyses were performed showing trends of these variables by diagnostic year, state of residence at diagnosis, and cancer site.
Results: Mean overall percent coding completeness by each NPCR central cancer registry averaged across all data elements and diagnosis years ranged from 92.3 percent to 100 percent. RESULTS showing the mean percent coding completeness for the relative survival-related variables in NPCR data are presented. All data elements but 1 have a mean coding completeness greater than 90 percent as was the mean completeness by data item group type. Statistically significant differences in coding completeness were found in the ICD revision number, cause of death, vital status, and date of last contact variables when comparing diagnosis years. The majority of data items had a coding quality greater than 90 percent, with exceptions found in cause of death, follow-up source, and the SEER Summary Stage 1977, and SEER Summary Stage 2000.
Conclusion: Percent coding completeness and quality are very high for variables in the NPCR-CSS that are covariates to calculating relative survival. NPCR provides the opportunity to calculate relative survival that may be more generalizable to the US population.