Background: One mechanism to examine if major depressive disorder (MDD) is related to the development of substance use disorder (SUD) is by leveraging naturalistic data available in the electronic health record (EHR). Rules for data extraction and variable construction linked to psychometrics validating their use are needed to extract data accurately.
Objective: We propose and validate a methodologic framework for using EHR variables to identify patients with MDD and non-nicotine SUD.
Methods: Proxy diagnoses and index dates of MDD and/or SUD were established using billing codes, problem lists, patient-reported outcome measures, and prescriptions. Manual chart reviews were conducted for the 1-year period surrounding each index date to determine (1) if proxy diagnoses were supported by chart notes and (2) if the index dates accurately captured disorder onset.
Results: The results demonstrated 100% positive predictive value for proxy diagnoses of MDD. The proxy diagnoses for SUD exhibited strong agreement (Cohen's kappa of 0.84) compared to manual chart review and 92% sensitivity, specificity, positive predictive value, and negative predictive value. Sixteen percent of patients showed inaccurate SUD index dates generated by EHR extraction with discrepancies of over 6 months compared to SUD onset identified through chart review.
Conclusions: Our methodology was very effective in identifying patients with MDD with or without SUD and moderately effective in identifying SUD onset date. These findings support the use of EHR data to make proxy diagnoses of MDD with or without SUD.
Keywords: Depression; Electronic health record; Methods; Substance use; Validation.