Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

Abstract

Background: Bone metastasis in advanced cancer is challenging because of pain, functional issues, and reduced life expectancy. Treatment planning is complex, with consideration of factors such as location, symptoms, and prognosis. Prognostic models help guide treatment choices, with Skeletal Oncology Research Group machine-learning algorithms (SORG-MLAs) showing promise in predicting survival for initial spinal metastases and extremity metastases treated with surgery or radiotherapy. Improved therapies extend patient lifespans, increasing the risk of subsequent skeletal-related events (SREs). Patients experiencing subsequent SREs often suffer from disease progression, indicating a deteriorating condition. For these patients, a thorough evaluation, including accurate survival prediction, is essential to determine the most appropriate treatment and avoid aggressive surgical treatment for patients with a poor survival likelihood. Patients experiencing subsequent SREs often suffer from disease progression, indicating a deteriorating condition. However, some variables in the SORG prediction model, such as tumor histology, visceral metastasis, and previous systemic therapies, might remain consistent between initial and subsequent SREs. Given the prognostic difference between patients with and without a subsequent SRE, the efficacy of established prognostic models-originally designed for individuals with an initial SRE-in addressing a subsequent SRE remains uncertain. Therefore, it is crucial to verify the model's utility for subsequent SREs.

Question/purpose: We aimed to evaluate the reliability of the SORG-MLAs for survival prediction in patients undergoing surgery or radiotherapy for a subsequent SRE for whom both the initial and subsequent SREs occurred in the spine or extremities.

Methods: We retrospectively included 738 patients who were 20 years or older who received surgery or radiotherapy for initial and subsequent SREs at a tertiary referral center and local hospital in Taiwan between 2010 and 2019. We excluded 74 patients whose initial SRE was in the spine and in whom the subsequent SRE occurred in the extremities and 37 patients whose initial SRE was in the extremities and the subsequent SRE was in the spine. The rationale was that different SORG-MLAs were exclusively designed for patients who had an initial spine metastasis and those who had an initial extremity metastasis, irrespective of whether they experienced metastatic events in other areas (for example, a patient experiencing an extremity SRE before his or her spinal SRE would also be regarded as a candidate for an initial spinal SRE). Because these patients were already validated in previous studies, we excluded them in case we overestimated our result. Five patients with malignant primary bone tumors and 38 patients in whom the metastasis's origin could not be identified were excluded, leaving 584 patients for analysis. The 584 included patients were categorized into two subgroups based on the location of initial and subsequent SREs: the spine group (68% [399]) and extremity group (32% [185]). No patients were lost to follow-up. Patient data at the time they presented with a subsequent SRE were collected, and survival predictions at this timepoint were calculated using the SORG-MLAs. Multiple imputation with the Missforest technique was conducted five times to impute the missing proportions of each predictor. The effectiveness of SORG-MLAs was gauged through several statistical measures, including discrimination (measured by the area under the receiver operating characteristic curve [AUC]), calibration, overall performance (Brier score), and decision curve analysis. Discrimination refers to the model's ability to differentiate between those with the event and those without the event. An AUC ranges from 0.5 to 1.0, with 0.5 indicating the worst discrimination and 1.0 indicating perfect discrimination. An AUC of 0.7 is considered clinically acceptable discrimination. Calibration is the comparison between the frequency of observed events and the predicted probabilities. In an ideal calibration, the observed and predicted survival rates should be congruent. The logarithm of observed-to-expected survival ratio [log(O:E)] offers insight into the model's overall calibration by considering the total number of observed (O) and expected (E) events. The Brier score measures the mean squared difference between the predicted probability of possible outcomes for each individual and the observed outcomes, ranging from 0 to 1, with 0 indicating perfect overall performance and 1 indicating the worst performance. Moreover, the prevalence of the outcome should be considered, so a null-model Brier score was also calculated by assigning a probability equal to the prevalence of the outcome (in this case, the actual survival rate) to each patient. The benefit of the prediction model is determined by comparing its Brier score with that of the null model. If a prediction model's Brier score is lower than the null model's Brier score, the prediction model is deemed as having good performance. A decision curve analysis was performed for models to evaluate the "net benefit," which weighs the true positive rate over the false positive rate against the "threshold probabilities," the ratio of risk over benefit after an intervention was derived based on a comprehensive clinical evaluation and a well-discussed shared-decision process. A good predictive model should yield a higher net benefit than default strategies (treating all patients and treating no patients) across a range of threshold probabilities.

Results: For the spine group, the algorithms displayed acceptable AUC results (median AUCs of 0.69 to 0.72) for 42-day, 90-day, and 1-year survival predictions after treatment for a subsequent SRE. In contrast, the extremity group showed median AUCs ranging from 0.65 to 0.73 for the corresponding survival periods. All Brier scores were lower than those of their null model, indicating the SORG-MLAs' good overall performances for both cohorts. The SORG-MLAs yielded a net benefit for both cohorts; however, they overestimated 1-year survival probabilities in patients with a subsequent SRE in the spine, with a median log(O:E) of -0.60 (95% confidence interval -0.77 to -0.42).

Conclusion: The SORG-MLAs maintain satisfactory discriminatory capacity and offer considerable net benefits through decision curve analysis, indicating their continued viability as prediction tools in this clinical context. However, the algorithms overestimate 1-year survival rates for patients with a subsequent SRE of the spine, warranting consideration of specific patient groups. Clinicians and surgeons should exercise caution when using the SORG-MLAs for survival prediction in these patients and remain aware of potential mispredictions when tailoring treatment plans, with a preference for less invasive treatments. Ultimately, this study emphasizes the importance of enhancing prognostic algorithms and developing innovative tools for patients with subsequent SREs as the life expectancy in patients with bone metastases continues to improve and healthcare providers will encounter these patients more often in daily practice.

Level of evidence: Level III, prognostic study.

MeSH terms

  • Adult
  • Aged
  • Bone Neoplasms* / mortality
  • Bone Neoplasms* / secondary
  • Decision Support Techniques
  • Disease Progression
  • Female
  • Humans
  • Machine Learning
  • Male
  • Middle Aged
  • Predictive Value of Tests
  • Prognosis
  • Reproducibility of Results
  • Retrospective Studies
  • Risk Assessment
  • Risk Factors