Development of a classifier to identify patients with probable Lennox-Gastaut syndrome in health insurance claims databases via random forest methodology

Curr Med Res Opin. 2019 Aug;35(8):1415-1420. doi: 10.1080/03007995.2019.1595552. Epub 2019 Apr 29.

Abstract

Objective: Describe the development of a claims-based classifier utilizing machine learning to identify patients with probable Lennox-Gastaut syndrome (LGS) from six state Medicaid programs. Methods: Patients were included if they had ≥2 medical claims ≥30 days apart for specified or unspecified epilepsy, excluding those with ≥1 claim for petit mal status. The LGS classifier utilized a random forest algorithm, a compilation of thousands of binary decision trees in which machine-generated predictor variables split the data set into branches that predict the presence or absence of LGS. To construct the splitting rules, the importance of each candidate variable was determined by calculating the mean decrease in Gini impurity. Training and testing were performed on two data sets (30% and 70%) using a "true" LGS and non-LGS patient population. Performance was compared with logistic regression and single tree methodology. Results: Using a 60% probability threshold, which yielded the highest sensitivity (97.3%) and specificity (95.6%), the classifier identified approximately 4% of patients with epilepsy as probable LGS. The most important input variables included number of distinct antiepileptic drugs received, epilepsy-related outpatient/inpatient visits, electroencephalogram procedures and claims for delayed development. The random forest methodology outperformed logistic regression and single tree methodology. Most of the important LGS predictor characteristics identified by the classifier were statistically significantly associated with LGS status (p < .05). Conclusions: The claims-based LGS classifier showed high sensitivity and specificity, outperformed single tree and logistic regression methodologies and identified a prevalence of probable LGS that was similar to previously published estimates.

Keywords: Epilepsy; LGS; LGS classifier; Lennox–Gastaut syndrome; machine learning; random forest methodology.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Administrative Claims, Healthcare
  • Databases, Factual
  • Decision Trees
  • Humans
  • Lennox Gastaut Syndrome / diagnosis*
  • Medicaid*
  • Models, Statistical*
  • Status Epilepticus
  • United States