Potential Impact of an Artificial Intelligence-based Mammography Triage Algorithm on Performance and Workload in a Population-based Screening Sample

J Breast Imaging. 2024 Sep 8:wbae056. doi: 10.1093/jbi/wbae056. Online ahead of print.

Abstract

Objective: To evaluate potential screening mammography performance and workload impact using a commercial artificial intelligence (AI)-based triage device in a population-based screening sample.

Methods: In this retrospective study, a sample of 2129 women who underwent screening mammograms were evaluated. The performance of a commercial AI-based triage device was compared with radiologists' reports, actual outcomes, and national benchmarks using commonly used mammography metrics. Up to 5 years of follow-up examination results were evaluated in cases to establish benignity. The algorithm sorted cases into groups of "suspicious" and "low suspicion." A theoretical workload reduction was calculated by subtracting cases triaged as "low suspicion" from the sample.

Results: At the default 93% sensitivity setting, there was significant improvement (P <.05) in the following triage simulation mean performance measures compared with actual outcome: 45.5% improvement in recall rate (13.4% to 7.3%; 95% CI, 6.2-8.3), 119% improvement in positive predictive value (PPV) 1 (5.3% to 11.6%; 95% CI, 9.96-13.4), 28.5% improvement in PPV2 (24.6% to 31.6%; 95% CI, 24.8-39.1), 20% improvement in sensitivity (83.3% to 100%; 95% CI, 100-100), and 7.2% improvement in specificity (87.2% to 93.5%; 95% CI, 92.4-94.5). A theoretical 62.5% workload reduction was possible. At the ultrahigh 99% sensitivity setting, a theoretical 27% workload reduction was possible. No cancers were missed by the algorithm at either sensitivity.

Conclusion: Artificial intelligence-based triage in this simulation demonstrated potential for significant improvement in mammography performance and predicted substantial theoretical workload reduction without any missed cancers.

Keywords: artificial intelligence; decision support; deep learning; screening mammogram; workload triage.