Reader variability in reporting breast imaging according to BI-RADS assessment categories (the Florence experience)

Breast. 2006 Feb;15(1):44-51. doi: 10.1016/j.breast.2005.04.019. Epub 2005 Aug 1.

Abstract

The inter- and intraobserver agreement (K statistic) in reporting according to BI-RADS assessment categories was tested on 12 dedicated breast radiologists, with little prior working knowledge of BI-RADS, reading a set of 50 lesions (29 malignant, 21 benign). Intraobserver agreement (four categories: R2, R3, R4, R5) was fair (0.21-0.40), moderate (0.41-0.60), substantial (0.61-0.80) or almost perfect (>0.80) for one, two, five or four radiologists, or (six categories: R2, R3, R4a, R4b, R4c, R5) fair, moderate, substantial or almost perfect for three, three, three or three radiologists, respectively. Interobserver agreement (four categories) was fair, moderate or substantial for three, six, or three radiologists, or (six categories) slight, fair or moderate for one, six, or five radiologists. Major disagreement occurred for intermediate categories (R3=0.12, R4=0.25, R4a=0.08, R4b=0.07, R4c=0.10). We found insufficient intra- and interobserver consistency of breast radiologists in reporting BI-RADS assessment categories. Although training may improve these results, simpler alternative reporting methods (systems), focused on clinical decision-making, should be explored.

Publication types

  • Evaluation Study

MeSH terms

  • Breast Neoplasms / diagnostic imaging*
  • Female
  • Humans
  • Mammography / standards*
  • Mammography / statistics & numerical data*
  • Observer Variation
  • Predictive Value of Tests
  • Reproducibility of Results
  • Sensitivity and Specificity