The use of "overall accuracy" to evaluate the validity of screening or diagnostic tests

Anthony J Alberg; Ji Wan Park; Brant W Hager; Malcolm V Brock; Marie Diener-West

doi:10.1111/j.1525-1497.2004.30091.x

The use of "overall accuracy" to evaluate the validity of screening or diagnostic tests

J Gen Intern Med. 2004 May;19(5 Pt 1):460-5. doi: 10.1111/j.1525-1497.2004.30091.x.

Authors

Anthony J Alberg¹, Ji Wan Park, Brant W Hager, Malcolm V Brock, Marie Diener-West

Affiliation

¹ Department of Epidemiology, The Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA. [email protected]

Abstract

Objective: Evaluations of screening or diagnostic tests sometimes incorporate measures of overall accuracy, diagnostic accuracy, or test efficiency. These terms refer to a single summary measurement calculated from 2 x 2 contingency tables that is the overall probability that a patient will be correctly classified by a screening or diagnostic test. We assessed the value of overall accuracy in studies of test validity, a topic that has not received adequate emphasis in the clinical literature.

Design: Guided by previous reports, we summarize the issues concerning the use of overall accuracy. To document its use in contemporary studies, a search was performed for test evaluation studies published in the clinical literature from 2000 to 2002 in which overall accuracy derived from a 2 x 2 contingency table was reported.

Measurements and main results: Overall accuracy is the weighted average of a test's sensitivity and specificity, where sensitivity is weighted by prevalence and specificity is weighted by the complement of prevalence. Overall accuracy becomes particularly problematic as a measure of validity as 1) the difference between sensitivity and specificity increases and/or 2) the prevalence deviates away from 50%. Both situations lead to an increasing deviation between overall accuracy and either sensitivity or specificity. A summary of results from published studies (N = 25) illustrated that the prevalence-dependent nature of overall accuracy has potentially negative consequences that can lead to a distorted impression of the validity of a screening or diagnostic test.

Conclusions: Despite the intuitive appeal of overall accuracy as a single measure of test validity, its dependence on prevalence renders it inferior to the careful and balanced consideration of sensitivity and specificity.

Publication types

Research Support, U.S. Gov't, P.H.S.
Review
Validation Study

MeSH terms

Diagnostic Techniques and Procedures / standards*
Humans
Mass Screening / standards*
Reference Standards
Reproducibility of Results
Sensitivity and Specificity

Abstract

Publication types

MeSH terms

Grants and funding