Current risk stratification systems for thyroid nodules suffer from low specificity and high biopsy rates. Recently, machine learning (ML) is introduced to assist thyroid nodule diagnosis but lacks interpretability. Here, we developed and validated ML models on 3965 thyroid nodules, as compared to the American College of Radiology Thyroid Imaging, Reporting and Data System (ACR TI-RADS). Subsequently, a SHapley Additive exPlanation (SHAP) algorithm was leveraged to interpret the results of the best-performing ML model. Clinical characteristics including thyroid-function tests were collected from medical records. Five ACR TI-RADS ultrasonography (US) categories plus nodule size were assessed by experienced radiologists. Random forest (RF), support vector machine (SVM) and extreme gradient boosting (XGBoost) were used to build US-only and US-clinical ML models. The ML models and ACR TI-RADS were compared in terms of diagnostic performance and unnecessary biopsy rate. Among the ML models, the US-only RF model (hereafter, Thy-Wise) achieved the optimal performance. Compared to ACR TI-RADS, Thy-Wise showed higher accuracy (82.4% vs 74.8% for the internal validation; 82.1% vs 73.4% for external validation) and specificity (78.7% vs 68.3% for internal validation; 78.5% vs 66.9% for external validation) while maintaining sensitivity (91.7% vs 91.2% for internal validation; 91.9% vs 91.1% for external validation), as well as reduced unnecessary biopsies (15.3% vs 32.3% for internal validation; 15.7% vs 47.3% for external validation). The SHAP-based interpretation of Thy-Wise enables clinicians to better understand the reasoning behind the diagnosis, which may facilitate the clinical translation of this model.
Keywords: diagnosis; machine learning; random forest; thyroid nodules; ultrasonography.
© 2022 UICC.