Background: We developed a machine learning algorithm to predict the survival of patients with chondrosarcoma. The algorithm demonstrated excellent discrimination and calibration on internal validation in a derivation cohort based on data from the Surveillance, Epidemiology, and End Results (SEER) registry. However, the algorithm has not been validated in an independent external dataset.
Questions/purposes: Does the Skeletal Oncology Research Group (SORG) algorithm accurately predict 5-year survival in an independent patient population surgically treated for chondrosarcoma?
Methods: The SORG algorithm was developed using the SEER registry, which contains demographic data, tumor characteristics, treatment, and outcome values; and includes approximately 30% of the cancer patients in the United States. The SEER registry was ideal for creating the derivation cohort, and consequently the SORG algorithm, because of the high number of eligible patients and the availability of most (explanatory) variables of interest. Between 1992 to 2013, 326 patients were treated surgically for extracranial chondrosarcoma of the bone at two tertiary care referral centers. Of those, 179 were accounted for at a minimum of 5 years after diagnosis in a clinical note at one of the two institutions, unless they died earlier, and were included in the validation cohort. In all, 147 (45%) did not meet the minimum 5 years of followup at the institution and were not included in the validation of the SORG algorithm. The outcome (survival at 5 years) was checked for all 326 patients in the Social Security death index and were included in the supplemental validation cohort, to also ascertain validity for patients with less than 5 years of institutional followup. Variables used in the SORG algorithm to predict 5-year survival including sex, age, histologic subtype, tumor grade, tumor size, tumor extension, and tumor location were collected manually from medical records. The tumor characteristics were collected from the postoperative musculoskeletal pathology report. Predicted probabilities of 5-year survival were calculated for each patient in the validation cohort using the SORG algorithm, followed by an assessment of performance using the same metrics as used for internal validation, namely: discrimination, calibration, and overall performance. Discrimination was calculated using the concordance statistic (or the area under the Receiver Operating Characteristic (ROC) curve) to determine how well the algorithm discriminates between the outcome, which ranges from 0.5 (no better than a coin-toss) to 1.0 (perfect discrimination). Calibration was assessed using the calibration slope and intercept from a calibration plot to measure the agreement between predicted and observed outcomes. A perfect calibration plot should show a 45° upwards line. Overall performance was determined using the Brier score, ranging from 0 (excellent prediction) to 1 (worst prediction). The Brier score was compared with the null-model Brier score, which showed the performance of a model that ignored all the covariates. A Brier score lower than the null model Brier score indicated greater performance of the algorithm. For the external validation an F1-score was added to measure the overall accuracy of the algorithm, which ranges between 0 (total failure of an algorithm) and 1 (perfect algorithm).The 5-year survival was lower in the validation cohort than it was in the derivation cohort from SEER (61.5% [110 of 179] versus 76% [1131 of 1544] ; p < 0.001). This difference was driven by higher proportion of dedifferentiated chondrosarcoma in the institutional population than in the derivation cohort (27% [49 of 179] versus 9% [131 of 1544]; p < 0.001). Patients in the validation cohort also had larger tumor sizes, higher grades, and nonextremity tumor locations than did those in the derivation cohort. These differences between the study groups emphasize that the external validation is performed not only in a different patient cohort, but also in terms of disease characteristics. Five-year survival was not different for both patient groups between subpopulations of patients with conventional chondrosarcomas and those with dedifferentiated chondrosarcomas.
Results: The concordance statistic for the validation cohort was 0.87 (95% CI, 0.80-0.91). Evaluation of the algorithm's calibration in the institutional population resulted in a calibration slope of 0.97 (95% CI, 0.68-1.3) and calibration intercept of -0.58 (95% CI, -0.20 to -0.97). Finally, on overall performance, the algorithm had a Brier score of 0.152 compared with a null-model Brier score of 0.237 for a high level of overall performance. The F1-score was 0.836. For the supplementary validation in the total of 326 patients, the SORG algorithm had a validation of 0.89 (95% CI, 0.85-0.93). The calibration slope was 1.13 (95% CI, 0.87-1.39) and the calibration intercept was -0.26 (95% CI, -0.57 to 0.06). The Brier score was 0.11, with a null-model Brier score of 0.19. The F1-score was 0.901.
Conclusions: On external validation, the SORG algorithm retained good discriminative ability and overall performance but overestimated 5-year survival in patients surgically treated for chondrosarcoma. This internet-based tool can help guide patient counseling and shared decision making.
Level of evidence: Level III, prognostic study.