Exploration of analysis methods for diagnostic imaging tests: problems with ROC AUC and confidence scores in CT colonography

Susan Mallett; Steve Halligan; Gary S Collins; Doug G Altman

doi:10.1371/journal.pone.0107633

Exploration of analysis methods for diagnostic imaging tests: problems with ROC AUC and confidence scores in CT colonography

PLoS One. 2014 Oct 29;9(10):e107633. doi: 10.1371/journal.pone.0107633. eCollection 2014.

Authors

Susan Mallett¹, Steve Halligan², Gary S Collins³, Doug G Altman³

Affiliations

¹ Department of Primary Care Health Sciences, University of Oxford, Oxford, United Kingdom.
² Centre for Medical Imaging, University College London, London, United Kingdom.
³ Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom.

Abstract

Background: Different methods of evaluating diagnostic performance when comparing diagnostic tests may lead to different results. We compared two such approaches, sensitivity and specificity with area under the Receiver Operating Characteristic Curve (ROC AUC) for the evaluation of CT colonography for the detection of polyps, either with or without computer assisted detection.

Methods: In a multireader multicase study of 10 readers and 107 cases we compared sensitivity and specificity, using radiological reporting of the presence or absence of polyps, to ROC AUC calculated from confidence scores concerning the presence of polyps. Both methods were assessed against a reference standard. Here we focus on five readers, selected to illustrate issues in design and analysis. We compared diagnostic measures within readers, showing that differences in results are due to statistical methods.

Results: Reader performance varied widely depending on whether sensitivity and specificity or ROC AUC was used. There were problems using confidence scores; in assigning scores to all cases; in use of zero scores when no polyps were identified; the bimodal non-normal distribution of scores; fitting ROC curves due to extrapolation beyond the study data; and the undue influence of a few false positive results. Variation due to use of different ROC methods exceeded differences between test results for ROC AUC.

Conclusions: The confidence scores recorded in our study violated many assumptions of ROC AUC methods, rendering these methods inappropriate. The problems we identified will apply to other detection studies using confidence scores. We found sensitivity and specificity were a more reliable and clinically appropriate method to compare diagnostic tests.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't

MeSH terms

Area Under Curve
Colonic Polyps / diagnostic imaging*
Colonography, Computed Tomographic / methods*
Diagnostic Imaging / methods*
Humans
ROC Curve
Radiographic Image Interpretation, Computer-Assisted
Sensitivity and Specificity

Abstract

Publication types

MeSH terms

Grants and funding