Background: Large national databases have become a common source of information on patterns of cancer care in the United States, particularly for low-incidence diseases such as sarcoma. Although aggregating information from many hospitals can achieve statistical power, this may come at a cost when complex variables must be abstracted from the medical record. There is a current lack of understanding of the frequency of use of the Surveillance, Epidemiology, and End Results (SEER) database and the National Cancer Database (NCDB) over the last two decades in musculoskeletal sarcoma research and whether their use tends to produce papers with conflicting findings.
Questions/purposes: (1) Is the number of published studies using the SEER and NCDB databases in musculoskeletal sarcoma research increasing over time? (2) What are the author, journal, and content characteristics of these studies? (3) Do studies using the SEER and the NCDB databases for similar diagnoses and study questions report concordant or discordant key findings? (4) Are the administrative data reported by our institution to the SEER and the NCDB databases concordant with the data in our longitudinally maintained, physician-run orthopaedic oncology dataset?
Methods: To answer our first three questions, PubMed was searched from 2001 through 2020 for all studies using the SEER or the NCDB databases to evaluate sarcoma. Studies were excluded from the review if they did not use these databases or studied anatomic locations other than the extremities, nonretroperitoneal pelvis, trunk, chest wall, or spine. To answer our first question, the number of SEER and NCDB studies were counted by year. The publication rate over the 20-year span was assessed with simple linear regression modeling. The difference in the mean number of studies between 5-year intervals (2001-2005, 2006-2010, 2011-2015, 2016-2020) was also assessed with Student t-tests. To answer our second question, we recorded and summarized descriptive data regarding author, journal, and content for these studies. To answer our third question, we grouped all studies by diagnosis, and then identified studies that shared the same diagnosis and a similar major study question with at least one other study. We then categorized study questions (and their associated studies) as having concordant findings, discordant findings, or mixed findings. Proportions of studies with concordant, discordant, or mixed findings were compared. To answer our fourth question, a coding audit was performed assessing the concordance of nationally reported administrative data from our institution with data from our longitudinally maintained, physician-run orthopaedic oncology dataset in a series of patients during the past 3 years. Our orthopaedic oncology dataset is maintained on a weekly basis by the senior author who manually records data directly from the medical record and sarcoma tumor board consensus notes; this dataset served as the gold standard for data comparison. We compared date of birth, surgery date, margin status, tumor size, clinical stage, and adjuvant treatment.
Results: The number of musculoskeletal sarcoma studies using the SEER and the NCDB databases has steadily increased over time in a linear regression model (β = 2.51; p < 0.001). The mean number of studies per year more than tripled during 2016-2020 compared with 2011-2015 (39 versus 13 studies; mean difference 26 ± 11; p = 0.03). Of the 299 studies in total, 56% (168 of 299) have been published since 2018. Nineteen institutions published more than five studies, and the most studies from one institution was 13. Orthopaedic surgeons authored 35% (104 of 299) of studies, and medical oncology journals published 44% (130 of 299). Of the 94 studies (31% of total [94 of 299]) that shared a major study question with at least one other study, 35% (33 of 94) reported discordant key findings, 29% (27 of 94) reported mixed key findings, and 44% (41 of 94) reported concordant key findings. Both concordant and discordant groups included papers on prognostic factors, demographic factors, and treatment strategies. When we compared nationally reported administrative data from our institution with our orthopaedic oncology dataset, we found clinically important discrepancies in adjuvant treatment (19% [15 of 77]), tumor size (21% [16 of 77]), surgery date (23% [18 of 77]), surgical margins (38% [29 of 77]), and clinical stage (77% [59 of 77]).
Conclusion: Appropriate use of databases in musculoskeletal cancer research is essential to promote clear interpretation of findings, as almost two-thirds of studies we evaluated that asked similar study questions produced discordant or mixed key findings. Readers should be mindful of the differences in what each database seeks to convey because asking the same questions of different databases may result in different answers depending on what information each database captures. Likewise, differences in how studies determine which patients to include or exclude, how they handle missing data, and what they choose to emphasize may result in different messages getting drawn from large-database studies. Still, given the rarity and heterogeneity of sarcomas, these databases remain particularly useful in musculoskeletal cancer research for nationwide incidence estimations, risk factor/prognostic factor assessment, patient demographic and hospital-level variable assessment, patterns of care over time, and hypothesis generation for future prospective studies.
Level of evidence: Level III, therapeutic study.
Copyright © 2022 by the Association of Bone and Joint Surgeons.