Assessing validity of a depression screening instrument in the absence of a gold standard

Bizu Gelaye; Mahlet G Tadesse; Michelle A Williams; Jesse R Fann; Ann Vander Stoep; Xiao-Hua Andrew Zhou

doi:10.1016/j.annepidem.2014.04.009

Assessing validity of a depression screening instrument in the absence of a gold standard

Ann Epidemiol. 2014 Jul;24(7):527-31. doi: 10.1016/j.annepidem.2014.04.009. Epub 2014 May 2.

Authors

Bizu Gelaye¹, Mahlet G Tadesse², Michelle A Williams³, Jesse R Fann⁴, Ann Vander Stoep⁵, Xiao-Hua Andrew Zhou⁶

Affiliations

¹ Department of Epidemiology, Harvard School of Public Health, Boston, MA; Department of Epidemiology, University of Washington School of Public Health, Seattle, WA. Electronic address: [email protected].
² Department of Mathematics and Statistics, Georgetown University, Washington, DC.
³ Department of Epidemiology, Harvard School of Public Health, Boston, MA.
⁴ Departments of Psychiatry and Behavioral Sciences, Rehabilitation Medicine and Epidemiology, Seattle, WA.
⁵ Department of Epidemiology, University of Washington School of Public Health, Seattle, WA.
⁶ Department of Biostatistics, University of Washington School of Public Health, Seattle, WA.

Abstract

Purpose: We evaluated the extent to which use of a hypothesized imperfect gold standard, the Composite International Diagnostic Interview (CIDI), biases the estimates of diagnostic accuracy of the Patient Health Questionnaire-9 (PHQ-9). We also evaluate how statistical correction can be used to address this bias.

Methods: The study was conducted among 926 adults where structured interviews were conducted to collect information about participants' current major depressive disorder using PHQ-9 and CIDI instruments. First, we evaluated the relative psychometric properties of PHQ-9 using CIDI as a gold standard. Next, we used a Bayesian latent class model to correct for the bias.

Results: In comparison with CIDI, the relative sensitivity and specificity of the PHQ-9 for detecting major depressive disorder at a cut point of 10 or more were 53.1% (95% confidence interval: 45.4%-60.8%) and 77.5% (95% confidence interval, 74.5%-80.5%), respectively. Using a Bayesian latent class model to correct for the bias arising from the use of an imperfect gold standard increased the sensitivity and specificity of PHQ-9 to 79.8% (95% Bayesian credible interval, 64.9%-90.8%) and 79.1% (95% Bayesian credible interval, 74.7%-83.7%), respectively.

Conclusions: Our results provided evidence that assessing diagnostic validity of mental health screening instrument, where application of a gold standard might not be available, can be accomplished by using appropriate statistical methods.

Keywords: Bayesian analysis; Depression; Imperfect gold standard; Screening.

Publication types

Evaluation Study
Research Support, N.I.H., Extramural

MeSH terms

Adult
Bayes Theorem
Depression / diagnosis*
Female
Humans
Male
Mass Screening / instrumentation*
Mental Health
Middle Aged
Psychometrics / statistics & numerical data*
Reproducibility of Results
Sensitivity and Specificity
Surveys and Questionnaires / standards*

Abstract

Publication types

MeSH terms

Grants and funding