Background : The traditional tool for predicting distant metastasis in renal cell carcinoma (RCC) is still insufficient. We aimed to establish an interpretable machine learning model for predicting distant metastasis in RCC patients.
Methods: We involved a population-based cohort of 121433 patients (mean age = 63 years; 63.58% men) diagnosed with RCC between 2004 and 2015 from the Surveillance, Epidemiology, and End Results (SEER) database. The lightGBM algorithm was used to develop prediction model and assessed by the area under the receiver-operating characteristic curve (AUC), accuracy, sensitivity, and specificity. The LightGBM model was then externally validated in 36395 RCC patients enrolled from the SEER database between 2016 and 2018. Shapley Additive exPlanations (SHAP) method was applied to provide insights into the model's outcome or prediction.
Results: Of 121433 patients involved in the study cohort, 10730 (8.84%) had distant metastasis. The LightGBM model showed good performance in the internal validation set (AUC: 0.955, 95% CI: 0.951-0.959) and temporal external validation sets (0.963, 95% CI: 0.959-0.967; 0.961, 95% CI: 0.954-0.966). Performance for the prediction model was also well performed in different sub-cohort stratified by age, gender, and ethnicity. The calibration curve indicated that the predicted values are highly consistent with the actual observed values. SHAP plots demonstrated that chemotherapy was the most vital variable for prediction of distant metastasis of RCC patients.
Conclusion: We developed an interpretable machine learning model that is capable of accurately predicting the risk of distant metastasis of RCC patients. The presented model could help identify high-risk patients who require additional treatment strategies and follow-up regimens.
Keywords: distant metastasis; interpretable; machine learning; prediction; renal cell carcinoma.
© 2025 Dong et al.