A test of performance of breast MRI interpretation in a multicentre screening study

Magn Reson Imaging. 2006 Sep;24(7):917-29. doi: 10.1016/j.mri.2006.03.004. Epub 2006 May 23.

Abstract

Objectives: The aim of this study was to assess the consistency and performance of radiologists interpreting breast magnetic resonance imaging (MRI) examinations.

Materials and methods: Two test sets of eight cases comprising cancers, benign disease, technical problems and parenchymal enhancement were prepared from two manufacturers' equipment (X and Y) and reported by 15 radiologists using the recording form and scoring system of the UK MRI breast screening study [(MAgnetic Resonance Imaging in Breast Screening (MARIBS)]. Variations in assessments of morphology, kinetic scores and diagnosis were measured by assessing intraobserver and interobserver variability and agreement. The sensitivity and specificity of reporting performances was determined using receiver operating characteristic (ROC) curve analysis.

Results: Intraobserver variation was seen in 13 (27.7%) of 47 of the radiologists' conclusions (four technical and seven pathological differences). Substantial interobserver variation was observed in the scores recorded for morphology, pattern of enhancement, quantification of enhancement and washout pattern. The overall sensitivity of breast MRI was high [88.6%, 95% confidence interval (CI) 77.4-94.7%], combined with a specificity of 69.2% (95% CI 60.5-76.7%). The sensitivities were similar for the two test sets (P=.3), but the specificity was significantly higher for the Manufacturer X dataset (P<.001). ROC curve analysis gave an area under the curve of 0.85 (95% CI 0.79-0.92)

Conclusions: Substantial variation in all elements of the scoring system and in the overall diagnostic conclusions was observed between radiologists participating in MARIBS. High overall sensitivity was achieved with moderate specificity. Manufacturer-related differences in specificities possibly occurred because the numerical thresholds set for the scoring system were not optimised for both equipment manufacturers. Scoring systems developed on one equipment software may not be transferable to other manufacturers.

Publication types

  • Multicenter Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Breast Neoplasms / diagnosis*
  • Clinical Competence*
  • Female
  • Humans
  • Image Interpretation, Computer-Assisted*
  • Magnetic Resonance Imaging*
  • Mass Screening
  • Observer Variation
  • ROC Curve
  • Reproducibility of Results
  • Sensitivity and Specificity