Assessing validity of a depression screening instrument in the absence of a gold standard

Ann Epidemiol. 2014 Jul;24(7):527-31. doi: 10.1016/j.annepidem.2014.04.009. Epub 2014 May 2.

Abstract

Purpose: We evaluated the extent to which use of a hypothesized imperfect gold standard, the Composite International Diagnostic Interview (CIDI), biases the estimates of diagnostic accuracy of the Patient Health Questionnaire-9 (PHQ-9). We also evaluate how statistical correction can be used to address this bias.

Methods: The study was conducted among 926 adults where structured interviews were conducted to collect information about participants' current major depressive disorder using PHQ-9 and CIDI instruments. First, we evaluated the relative psychometric properties of PHQ-9 using CIDI as a gold standard. Next, we used a Bayesian latent class model to correct for the bias.

Results: In comparison with CIDI, the relative sensitivity and specificity of the PHQ-9 for detecting major depressive disorder at a cut point of 10 or more were 53.1% (95% confidence interval: 45.4%-60.8%) and 77.5% (95% confidence interval, 74.5%-80.5%), respectively. Using a Bayesian latent class model to correct for the bias arising from the use of an imperfect gold standard increased the sensitivity and specificity of PHQ-9 to 79.8% (95% Bayesian credible interval, 64.9%-90.8%) and 79.1% (95% Bayesian credible interval, 74.7%-83.7%), respectively.

Conclusions: Our results provided evidence that assessing diagnostic validity of mental health screening instrument, where application of a gold standard might not be available, can be accomplished by using appropriate statistical methods.

Keywords: Bayesian analysis; Depression; Imperfect gold standard; Screening.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Adult
  • Bayes Theorem
  • Depression / diagnosis*
  • Female
  • Humans
  • Male
  • Mass Screening / instrumentation*
  • Mental Health
  • Middle Aged
  • Psychometrics / statistics & numerical data*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Surveys and Questionnaires / standards*