Predicting Depression among Patients with Diabetes Using Longitudinal Data. A Multilevel Regression Model

Methods Inf Med. 2015;54(6):553-9. doi: 10.3414/ME14-02-0009. Epub 2015 Nov 18.

Abstract

Introduction: This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare".

Background: Depression is a common and often undiagnosed condition for patients with diabetes. It is also a condition that significantly impacts healthcare outcomes, use, and cost as well as elevating suicide risk. Therefore, a model to predict depression among diabetes patients is a promising and valuable tool for providers to proactively assess depressive symptoms and identify those with depression.

Objectives: This study seeks to develop a generalized multilevel regression model, using a longitudinal data set from a recent large-scale clinical trial, to predict depression severity and presence of major depression among patients with diabetes.

Methods: Severity of depression was measured by the Patient Health Questionnaire PHQ-9 score. Predictors were selected from 29 candidate factors to develop a 2-level Poisson regression model that can make population-average predictions for all patients and subject-specific predictions for individual patients with historical records. Newly obtained patient records can be incorporated with historical records to update the prediction model. Root-mean-square errors (RMSE) were used to evaluate predictive accuracy of PHQ-9 scores. The study also evaluated the classification ability of using the predicted PHQ-9 scores to classify patients as having major depression.

Results: Two time-invariant and 10 time-varying predictors were selected for the model. Incorporating historical records and using them to update the model may improve both predictive accuracy of PHQ-9 scores and classification ability of the predicted scores. Subject-specific predictions (for individual patients with historical records) achieved RMSE about 4 and areas under the receiver operating characteristic (ROC) curve about 0.9 and are better than population-average predictions.

Conclusions: The study developed a generalized multilevel regression model to predict depression and demonstrated that using generalized multilevel regression based on longitudinal patient records can achieve high predictive ability.

Keywords: Depression; comorbidity; diabetes mellitus; machine learning; multilevel regression.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • California / epidemiology
  • Causality
  • Computer Simulation
  • Decision Support Systems, Clinical / organization & administration
  • Depression / diagnosis*
  • Depression / epidemiology*
  • Diabetes Complications / diagnosis*
  • Diabetes Complications / epidemiology*
  • Electronic Health Records / classification*
  • Electronic Health Records / statistics & numerical data
  • Humans
  • Longitudinal Studies
  • Machine Learning
  • Natural Language Processing
  • Prevalence
  • Prognosis
  • Proportional Hazards Models*
  • Regression Analysis
  • Reproducibility of Results
  • Risk Factors
  • Sensitivity and Specificity