Performance of computer scientists in the assessment of thyroid nodules using TIRADS lexicons

J Endocrinol Invest. 2024 Dec 18. doi: 10.1007/s40618-024-02518-9. Online ahead of print.

Abstract

Objectives: Ultrasound (US) evaluation is recognized as pivotal in assessing the risk of malignancy (RoM) of thyroid nodules (TNs). Recently, various US-based risk-classification systems (Thyroid Imaging and Reporting Data Systems [TIRADSs] have been developed. An important ongoing project concerns the creation of an international system (I-TIRADS) using unique terminology. Since online tool allow clinicians and patients to stratify the RoM of any TN, the role of computer scientist (CS) should be relevant. This study explored the performance of CS in assessing TNs across the TIRADS categories.

Methods: The most diffused TIRADSs (i.e., ACR, EU, and K) were considered. Three-hundred scenarios were created. A CS was asked to assess the 300 TNs according to ACR-, EU-, and K-TIRADS. These data were compared with that of clinicians. The inter-observer agreement was estimated with Cohen kappa (κ). Word-cloud plots were used to graph the US descriptors with disagreement.

Results: The correspondence of the CS's assessment with the physicians was 100%, 81%, and 43%, using ACR-, EU-, and K-TIRADS, respectively. The CS was unable to classify 19/100 TNs according to EU-TIRADS and 15/100 TNs according to K-TIRADS. The inter-observer agreement between CS and physicians was excellent for ACR-TIRADS (κ = 1), moderate for EU-TIRADS (κ = 0.56), and fair for K-TIRADS (κ = 0.22). Among the non-concordant cases, 16/22 descriptors for EU-TIRADS and 18/18 descriptors for K-TIRADS were found.

Conclusion: CSs are confident with the ACR-TIRADS lexicon and structure while not with EU- and K-TIRADS, probably because they are pattern-based systems requiring medical training.

Keywords: Artificial Intelligence; TIRADS; Thyroid; Ultrasound.