Breast MRI Background Parenchymal Enhancement Categorization Using Deep Learning: Outperforming the Radiologist

Sarah Eskreis-Winkler; Elizabeth J Sutton; Donna D'Alessio; Katherine Gallagher; Nicole Saphier; Joseph Stember; Danny F Martinez; Elizabeth A Morris; Katja Pinker

doi:10.1002/jmri.28111

Breast MRI Background Parenchymal Enhancement Categorization Using Deep Learning: Outperforming the Radiologist

J Magn Reson Imaging. 2022 Oct;56(4):1068-1076. doi: 10.1002/jmri.28111. Epub 2022 Feb 15.

Authors

Sarah Eskreis-Winkler¹, Elizabeth J Sutton¹, Donna D'Alessio¹, Katherine Gallagher¹, Nicole Saphier¹, Joseph Stember², Danny F Martinez¹, Elizabeth A Morris³, Katja Pinker¹

Affiliations

¹ Department of Radiology, Breast Imaging Service, Memorial Sloan Kettering Cancer Center, New York, New York, USA.
² Department of Radiology, Neuroradiology Service, Memorial Sloan Kettering Cancer Center, New York, New York, USA.
³ Department of Radiology, UC Davis Health, Davis, California, USA.

Abstract

Background: Background parenchymal enhancement (BPE) is assessed on breast MRI reports as mandated by the Breast Imaging Reporting and Data System (BI-RADS) but is prone to inter and intrareader variation. Semiautomated and fully automated BPE assessment tools have been developed but none has surpassed radiologist BPE designations.

Purpose: To develop a deep learning model for automated BPE classification and to compare its performance with current standard-of-care radiology report BPE designations.

Study type: Retrospective.

Population: Consecutive high-risk patients (i.e. >20% lifetime risk of breast cancer) who underwent contrast-enhanced screening breast MRI from October 2013 to January 2019. The study included 5224 breast MRIs, divided into 3998 training, 444 validation, and 782 testing exams. On radiology reports, 1286 exams were categorized as high BPE (i.e., marked or moderate) and 3938 as low BPE (i.e., mild or minimal).

Field strength/sequence: A 1.5 T or 3 T system; one precontrast and three postcontrast phases of fat-saturated T1-weighted dynamic contrast-enhanced imaging.

Assessment: Breast MRIs were used to develop two deep learning models (Slab artificial intelligence (AI); maximum intensity projection [MIP] AI) for BPE categorization using radiology report BPE labels. Models were tested on a heldout test sets using radiology report BPE and three-reader averaged consensus as the reference standards.

Statistical tests: Model performance was assessed using receiver operating characteristic curve analysis. Associations between high BPE and BI-RADS assessments were evaluated using McNemar's chi-square test (α* = 0.025).

Results: The Slab AI model significantly outperformed the MIP AI model across the full test set (area under the curve of 0.84 vs. 0.79) using the radiology report reference standard. Using three-reader consensus BPE labels reference standard, our AI model significantly outperformed radiology report BPE labels. Finally, the AI model was significantly more likely than the radiologist to assign "high BPE" to suspicious breast MRIs and significantly less likely than the radiologist to assign "high BPE" to negative breast MRIs.

Data conclusion: Fully automated BPE assessments for breast MRIs could be more accurate than BPE assessments from radiology reports.

Level of evidence: 4 TECHNICAL EFFICACY STAGE: 3.

Keywords: artificial intelligence; background parenchymal enhancement; breast MRI; cancer risk assessment; deep learning.

Publication types

Research Support, Non-U.S. Gov't
Research Support, N.I.H., Extramural

MeSH terms

Artificial Intelligence
Breast Neoplasms* / diagnostic imaging
Deep Learning*
Female
Humans
Magnetic Resonance Imaging / methods
Radiologists
Retrospective Studies

Grants and funding

P30 CA008748/CA/NCI NIH HHS/United States