Background: High breast density is a strong risk factor for breast cancer. As such, high consistency and accuracy in breast density assessment is necessary.
Purpose: To validate our proposed deep learning (DL) model and explore its impact on radiologists on density assessments.
Material and methods: A total of 3732 mammographic cases were collected as a validated set: 1686 cases before the implementation of the DL model and 2046 cases after the DL model. Five radiologists were divided into two groups (junior and senior groups) to assess all mammograms using either two- or four-category evaluation. Linear-weighted kappa (K) and intraclass correlation coefficient (ICC) statistics were used to analyze the consistency between radiologists before and after implementation of the DL model.
Results: The accuracy and clinical acceptance of the DL model for the junior group were 96.3% and 96.8% for two-category evaluation, and 85.6% and 89.6% for four-category evaluation, respectively. For the senior group, the accuracy and clinical acceptance were 95.5% and 98.0% for two-category evaluation, and 84.3% and 95.3% for four-category evaluation, respectively. The consistency within the junior group, the senior group, and among all radiologists improved with the help of the DL model. For two-category, their K and ICC values improved to 0.81, 0.81, and 0.80 from 0.73, 0.75, and 0.76. And for four-category, their K and ICC values improved to 0.81, 0.82, and 0.82 from 0.73, 0.79, and 0.78, respectively.
Conclusion: The DL model showed high accuracy and clinical acceptance in breast density categories. It is helpful to improve radiologists' consistency.
Keywords: Breast density; artificial intelligence; deep learning; mammography.