Deep Learning Algorithms for Breast Cancer Detection in a UK Screening Cohort: As Stand-alone Readers and Combined with Human Readers

Sarah E Hickman; Nicholas R Payne; Richard T Black; Yuan Huang; Andrew N Priest; Sue Hudson; Bahman Kasmai; Arne Juette; Muzna Nanaa; Fiona J Gilbert

doi:10.1148/radiol.233147

Deep Learning Algorithms for Breast Cancer Detection in a UK Screening Cohort: As Stand-alone Readers and Combined with Human Readers

Radiology. 2024 Nov;313(2):e233147. doi: 10.1148/radiol.233147.

Authors

Affiliation

¹ From the Department of Radiology, University of Cambridge School of Clinical Medicine, Box 218, Level 5, Cambridge Biomedical Campus, Cambridge CB2 0QQ, United Kingdom (S.E.H., N.R.P., Y.H., A.N.P., M.N., F.J.G.); Department of Radiology, Royal London Hospital, Barts Health NHS Trust, London, United Kingdom (S.E.H.); Department of Radiology, Addenbrooke's Hospital, Cambridge University Hospitals NHS Foundation Trust, Cambridge, United Kingdom (R.T.B., A.N.P., F.J.G.); Engineering and Physical Sciences Research Council Cambridge Mathematics of Information in Healthcare Hub, University of Cambridge, Cambridge, United Kingdom (Y.H.); Peel and Schriek Consulting, London, United Kingdom (S.H.); Department of Radiology, Norfolk and Norwich University Hospital, Norwich, United Kingdom (B.K., A.J.); and University of East Anglia, Norwich, United Kingdom (B.K.).

PMID: 39560480
DOI: 10.1148/radiol.233147

Abstract

Background Deep learning (DL) algorithms have shown promising results in mammographic screening either compared to a single reader or, when deployed in conjunction with a human reader, compared with double reading. Purpose To externally validate the performance of three DL algorithms as mammographic screen readers in an independent UK data set. Materials and Methods Three commercial DL algorithms (DL-1, DL-2, and DL-3) were retrospectively investigated from January 2022 to June 2022 using consecutive full-field digital mammograms collected at two UK sites during 1 year (2017). Normal cases with 3-year follow-up and histopathologically proven cancer cases detected either at screening (that round or next) or within the 3-year interval were included. A preset specificity threshold equivalent to a single reader was applied. Performance was evaluated for stand-alone DL reading compared with single human reading, and for DL reading combined with human reading compared with double reading, using sensitivity and specificity as the primary metrics. P < .025 was considered to indicate statistical significance for noninferiority testing. Results A total of 26 722 cases (median patient age, 59.0 years [IQR, 54.0-63.0 years]) with mammograms acquired using machines from two vendors were included. Cases included 332 screen-detected, 174 interval, and 254 next-round cancers. Two of three stand-alone DL algorithms achieved noninferior sensitivity (DL-1: 64.8%, P < .001; DL-2: 56.7%, P = .03; DL-3: 58.9%, P < .001) compared with the single first reader (62.8%), and specificity was noninferior for DL-1 (92.8%; P < .001) and DL-2 (96.8%; P < .001) and superior for DL-3 (97.9%; P < .001) compared with the single first reader (96.5%). Combining the DL algorithms with human readers achieved noninferior sensitivity (67.0%, 65.6%, and 65.4% for DL-1, DL-2, and DL-3, respectively; P < .001 for all) compared with double reading (67.4%), and superior specificity (97.4%, 97.6%, and 97.6%; P < .001 for all) compared with double reading (97.1%). Conclusion Use of stand-alone DL algorithms in combination with a human reader could maintain screening accuracy while reducing workload. Published under a CC BY 4.0 license. Supplemental material is available for this article.

MeSH terms

Aged
Algorithms
Breast Neoplasms* / diagnostic imaging
Deep Learning*
Early Detection of Cancer / methods
Female
Humans
Mammography* / methods
Middle Aged
Radiographic Image Interpretation, Computer-Assisted / methods
Retrospective Studies
Sensitivity and Specificity*
United Kingdom