Improving the Performance of Radiologists Using Artificial Intelligence-Based Detection Support Software for Mammography: A Multi-Reader Study

Jeong Hoon Lee; Ki Hwan Kim; Eun Hye Lee; Jong Seok Ahn; Jung Kyu Ryu; Young Mi Park; Gi Won Shin; Young Joong Kim; Hye Young Choi

doi:10.3348/kjr.2021.0476

Improving the Performance of Radiologists Using Artificial Intelligence-Based Detection Support Software for Mammography: A Multi-Reader Study

Korean J Radiol. 2022 May;23(5):505-516. doi: 10.3348/kjr.2021.0476. Epub 2022 Apr 4.

Authors

Affiliations

¹ Lunit Inc., Seoul, Korea.
² Department of Radiology, Soonchunhyang University Bucheon Hospital, Soonchunhyang University College of Medicine, Bucheon, Korea. [email protected].
³ Department of Radiology, Kyung Hee University Hospital at Gangdong, Seoul, Korea.
⁴ Department of Radiology, Inje University Busan Paik Hospital, Inje University College of Medicine, Busan, Korea.
⁵ Department of Radiology, Konyang University Hospital, Konyang University College of Medicine, Daejeon, Korea.
⁶ Department of Radiology, Gyeongsang National University Hospital and College of Medicine, Gyeongsang National University, Jinju, Korea.

^# Contributed equally.

Abstract

Objective: To evaluate whether artificial intelligence (AI) for detecting breast cancer on mammography can improve the performance and time efficiency of radiologists reading mammograms.

Materials and methods: A commercial deep learning-based software for mammography was validated using external data collected from 200 patients, 100 each with and without breast cancer (40 with benign lesions and 60 without lesions) from one hospital. Ten readers, including five breast specialist radiologists (BSRs) and five general radiologists (GRs), assessed all mammography images using a seven-point scale to rate the likelihood of malignancy in two sessions, with and without the aid of the AI-based software, and the reading time was automatically recorded using a web-based reporting system. Two reading sessions were conducted with a two-month washout period in between. Differences in the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, and reading time between reading with and without AI were analyzed, accounting for data clustering by readers when indicated.

Results: The AUROC of the AI alone, BSR (average across five readers), and GR (average across five readers) groups was 0.915 (95% confidence interval, 0.876-0.954), 0.813 (0.756-0.870), and 0.684 (0.616-0.752), respectively. With AI assistance, the AUROC significantly increased to 0.884 (0.840-0.928) and 0.833 (0.779-0.887) in the BSR and GR groups, respectively (p = 0.007 and p < 0.001, respectively). Sensitivity was improved by AI assistance in both groups (74.6% vs. 88.6% in BSR, p < 0.001; 52.1% vs. 79.4% in GR, p < 0.001), but the specificity did not differ significantly (66.6% vs. 66.4% in BSR, p = 0.238; 70.8% vs. 70.0% in GR, p = 0.689). The average reading time pooled across readers was significantly decreased by AI assistance for BSRs (82.73 vs. 73.04 seconds, p < 0.001) but increased in GRs (35.44 vs. 42.52 seconds, p < 0.001).

Conclusion: AI-based software improved the performance of radiologists regardless of their experience and affected the reading time.

Keywords: Artificial intelligence; Breast cancer; Deep-learning; Mammography; Multi-reader study; Reading time; Screening.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Artificial Intelligence*
Breast Neoplasms* / diagnostic imaging
Female
Humans
Mammography / methods
Radiologists
Retrospective Studies
Sensitivity and Specificity
Software