Objectives: To compare breast density (BD) assessment provided by an automated BD evaluator (ABDE) with that provided by a panel of experienced breast radiologists, on a multivendor dataset.
Methods: Twenty-one radiologists assessed 613 screening/diagnostic digital mammograms from nine centers and six different vendors, using the BI-RADS a, b, c, and d density classification. The same mammograms were also evaluated by an ABDE providing the ratio between fibroglandular and total breast area on a continuous scale and, automatically, the BI-RADS score. A panel majority report (PMR) was used as reference standard. Agreement (κ) and accuracy (proportion of cases correctly classified) were calculated for binary (BI-RADS a-b versus c-d) and 4-class classification.
Results: While the agreement of individual radiologists with the PMR ranged from κ = 0.483 to κ = 0.885, the ABDE correctly classified 563/613 mammograms (92 %). A substantial agreement for binary classification was found for individual reader pairs (κ = 0.620, standard deviation [SD] = 0.140), individual versus PMR (κ = 0.736, SD = 0.117), and individual versus ABDE (κ = 0.674, SD = 0.095). Agreement between ABDE and PMR was almost perfect (κ = 0.831).
Conclusions: The ABDE showed an almost perfect agreement with a 21-radiologist panel in binary BD classification on a multivendor dataset, earning a chance as a reproducible alternative to visual evaluation.
Key points: Individual BD assessment differs from PMR with κ as low as 0.483. An ABDE correctly classified 92 % of mammograms with almost perfect agreement (κ = 0.831). An ABDE can be a valid alternative to subjective BD assessment.
Keywords: Automated system; BI-RADS density classification; Breast density; Digital mammography; Multireader/multivendor.