Predicting the pathological invasiveness in patients with a solitary pulmonary nodule via Shapley additive explanations interpretation of a tree-based machine learning radiomics model: a multicenter study

Quant Imaging Med Surg. 2023 Dec 1;13(12):7828-7841. doi: 10.21037/qims-23-615. Epub 2023 Oct 7.

Abstract

Background: Radiomics models could help assess the benign and malignant invasiveness and prognosis of pulmonary nodules. However, the lack of interpretability limits application of these models. We thus aimed to construct and validate an interpretable and generalized computed tomography (CT) radiomics model to evaluate the pathological invasiveness in patients with a solitary pulmonary nodule in order to improve the management of these patients.

Methods: We retrospectively enrolled 248 patients with CT-diagnosed solitary pulmonary nodules. Radiomic features were extracted from nodular region and perinodular regions of 3 and 5 mm. After coarse-to-fine feature selection, the radiomics score (radscore) was calculated using the least absolute shrinkage and selection operator logistic method. Univariate and multivariate logistic regression analyses were performed to determine the invasiveness-related clinicoradiological factors. The clinical-radiomics model was then constructed using the logistic and extreme gradient boosting (XGBoost) algorithms. The Shapley additive explanations (SHAP) method was then used to explain the contributions of the features. After removing batch effects with the ComBat algorithm, we assessed the generalization of the explainable clinical-radiomics model in two independent external validation cohorts (n=147 and n=149).

Results: The clinical-radiomic XGBoost model integrating the radscore, CT value, nodule length, and crescent sign demonstrated better predictive performance than did the clinical-radiomics logistic model in assessing pulmonary nodule invasiveness, with an area under the receiver operating characteristic (ROC) curve (AUC) of 0.889 [95% confidence interval (CI), 0.848-0.927] in the training cohort. The SHAP algorithm illustrates the contribution of each feature in the final model. The specific model decision process was visualized using a tree-based decision heatmap. Satisfactory generalization performance was shown with AUCs of 0.889 (95% CI, 0.823-0.942) and 0.915 (95% CI, 0.851-0.963) in the two external validation cohorts.

Conclusions: An interpretable and generalized clinical-radiomics model for predicting pulmonary nodule invasibility was constructed to help clinicians determine the invasiveness of pulmonary nodules and devise assessment strategies in an easily understandable manner.

Keywords: Pulmonary nodules; Shapley additive explanations (SHAP); extreme gradient boosting (XGBoost); invasiveness; radiomics.