Automatic tumor segmentation and lymph node metastasis prediction in papillary thyroid carcinoma using ultrasound keyframes

Xian-Ya Zhang; Di Zhang; Zhi-Yuan Wang; Jun Chen; Jia-Yu Ren; Ting Ma; Jian-Jun Lin; Christoph F Dietrich; Xin-Wu Cui

doi:10.1002/mp.17498

Automatic tumor segmentation and lymph node metastasis prediction in papillary thyroid carcinoma using ultrasound keyframes

Med Phys. 2025 Jan;52(1):257-273. doi: 10.1002/mp.17498. Epub 2024 Oct 30.

Authors

Xian-Ya Zhang¹, Di Zhang², Zhi-Yuan Wang³, Jun Chen⁴, Jia-Yu Ren¹, Ting Ma¹, Jian-Jun Lin⁵, Christoph F Dietrich⁶, Xin-Wu Cui¹

Affiliations

¹ Department of Medical Ultrasound, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
² Department of Medical Ultrasound, The First Affiliated Hospital of Anhui Medical University, Hefei, China.
³ Department of Medical Ultrasound, Hunan Cancer Hospital/The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China.
⁴ GE Healthcare, Wuhan, China.
⁵ Department of Medical Ultrasound, The First People's Hospital of Qinzhou, Qinzhou, China.
⁶ Department of Internal Medicine, Hirslanden Clinic, Bern, Switzerland.

PMID: 39475358
DOI: 10.1002/mp.17498

Abstract

Background: Accurate preoperative prediction of cervical lymph node metastasis (LNM) for papillary thyroid carcinoma (PTC) patients is essential for disease staging and individualized treatment planning, which can improve prognosis and facilitate better management.

Purpose: To establish a fully automated deep learning-enabled model (FADLM) for automated tumor segmentation and cervical LNM prediction in PTC using ultrasound (US) video keyframes.

Methods: The bicentral study retrospective enrolled 518 PTC patients, who were then randomly divided into the training (Hospital 1, n = 340), internal test (Hospital 1, n = 83), and external test cohorts (Hospital 2, n = 95). The FADLM integrated mask region-based convolutional neural network (Mask R-CNN) for automatic thyroid primary tumor segmentation and ResNet34 with Bayes strategy for cervical LNM diagnosis. A radiomics model (RM) using the same automated segmentation method, a traditional radiomics model (TRM) using manual segmentation, and a clinical-semantic model (CSM) were developed for comparison. The dice similarity coefficient (DSC) was used to evaluate segmentation performance. The prediction performance of the models was validated in terms of discrimination and clinical utility with the area under the receiver operator characteristic curve (AUC), heatmap analysis, and decision curve analysis (DCA). The comparison of the predictive performance among different models was conducted by DeLong test. The performances of two radiologists compared with FADLM and the diagnostic augmentation with FADLM's assistance were analyzed in terms of accuracy, sensitivity and specificity using McNemar's x² test. The p-value less than 0.05 was defined as a statistically significant difference. The Benjamini-Hochberg procedure was applied for multiple comparisons to deal with Type I error.

Results: The FADLM yielded promising segmentation results in training (DSC: 0.88 ± 0.23), internal test (DSC: 0.88 ± 0.23), and external test cohorts (DSC: 0.85 ± 0.24). The AUCs of FADLM for cervical LNM prediction were 0.78 (95% CI: 0.73, 0.83), 0.83 (95% CI: 0.74, 0.92), and 0.83 (95% CI: 0.75, 0.92), respectively. It all significantly outperformed the RM (AUCs: 0.78 vs. 0.72; 0.83 vs. 0.65; 0.83 vs. 0.68, all adjusted p-values < 0.05) and CSM (AUCs: 0.78 vs. 0.71; 0.83 vs. 0.62; 0.83 vs. 0.68, all adjusted p-values < 0.05) across the three cohorts. The RM offered similar performance to that of TRM (AUCs: 0.61 vs. 0.63, adjusted p-value = 0.60) while significantly reducing the segmentation time (3.3 ± 3.8 vs. 14.1 ± 4.2 s, p-value < 0.001). Under the assistance of FADLM, the accuracies of junior and senior radiologists were improved by 18% and 15% (all adjusted p-values < 0.05) and the sensitivities by 25% and 21% (all adjusted p-values < 0.05) in the external test cohort.

Conclusion: The FADLM with elaborately designed automated strategy using US video keyframes holds good potential to provide an efficient and consistent prediction of cervical LNM in PTC. The FADLM displays superior performance to RM, CSM, and radiologists with promising efficacy.

Keywords: cervical lymph node metastasis; deep learning; fully automated model; thyroid papillary carcinoma; ultrasound.

MeSH terms

Adult
Automation
Deep Learning
Female
Humans
Image Processing, Computer-Assisted* / methods
Lymphatic Metastasis* / diagnostic imaging
Male
Middle Aged
Retrospective Studies
Thyroid Cancer, Papillary* / diagnostic imaging
Thyroid Cancer, Papillary* / pathology
Thyroid Neoplasms* / diagnostic imaging
Thyroid Neoplasms* / pathology
Ultrasonography*

Abstract

MeSH terms

Grants and funding