Comparison of diagnostic accuracy and utility of artificial intelligence-optimized ACR TI-RADS and original ACR TI-RADS: a multi-center validation study based on 2061 thyroid nodules

Eur Radiol. 2022 Nov;32(11):7733-7742. doi: 10.1007/s00330-022-08827-y. Epub 2022 May 4.

Abstract

Objective: To determine if artificial intelligence-based modification of the Thyroid Imaging Reporting Data System (TI-RADS) would be better than the current American College of Radiology (ACR) TI-RADS for risk stratification of thyroid nodules.

Methods: A total of 2061 thyroid nodules (in 1859 patients) sampled with fine-needle aspiration or operation were retrospectively analyzed between January 2017 and July 2020. Two radiologists blinded to the pathologic diagnosis evaluated nodule features in five ultrasound categories and assigned TI-RADS scores by both ACR TI-RADS and AI TI-RADS. Inter-rater agreement was assessed by asking another two radiologists to score a set of 100 nodules independently. The reference standard was postoperative pathological or cytopathological diagnosis according to the Bethesda system. Inter-rater agreement was determined using intraclass correlation coefficient (ICC).

Results: AI TI-RADS assigned lower TI-RADS risk levels than ACR TI-RADS (p < 0.001) and had larger area under receiver operating characteristic curve (0.762 vs. 0.679, p < 0.001). The sensitivities of ACR TI-RADS and AI TI-RADS were similar (86.7% vs. 82.2%, p = 0.052), but specificity was higher with AI TI-RADS (70.2% vs. 49.2%, p < 0.001). AI TI-RADS downgraded 743 (48.63%) benign nodules, indicating that 328 (42.3% of 776 biopsied nodules) unnecessary fine-needle aspirations (FNA) could have been avoided. Inter-rater agreement was better with AI TI-RADS than with ACR TI-RADS (ICC, 0.808 vs. 0.861, p < 0.001).

Conclusion: AI TI-RADS can achieve meaningful reduction in the number of benign thyroid nodules recommended for biopsy and significantly improve specificity despite a slight decrease in sensitivity.

Key points: • AI TI-RADS assigned lower TI-RADS risk levels than ACR TI-RADS, showing similar sensitivity but higher specificity. • Half of the benign nodules can be downgraded of which 42.3% of biopsy nodules avoided unnecessary fine-needle aspiration (FNA). • AI TI-RADS had a better overall inter-rater agreement.

Keywords: FNA; TI-RADS; Thyroid nodules; Ultrasound.

Publication types

  • Multicenter Study

MeSH terms

  • Artificial Intelligence
  • Biopsy, Fine-Needle
  • Humans
  • Retrospective Studies
  • Thyroid Nodule* / diagnostic imaging
  • Thyroid Nodule* / pathology
  • Ultrasonography / methods