Identification of patients with epilepsy using automated electronic health records phenotyping

Epilepsia. 2023 Jun;64(6):1472-1481. doi: 10.1111/epi.17589. Epub 2023 Apr 4.

Abstract

Objective: Unstructured data present in electronic health records (EHR) are a rich source of medical information; however, their abstraction is labor intensive. Automated EHR phenotyping (AEP) can reduce the need for manual chart review. We present an AEP model that is designed to automatically identify patients diagnosed with epilepsy.

Methods: The ground truth for model training and evaluation was captured from a combination of structured questionnaires filled out by physicians for a subset of patients and manual chart review using customized software. Modeling features included indicators of the presence of keywords and phrases in unstructured clinical notes, prescriptions for antiseizure medications (ASMs), International Classification of Diseases (ICD) codes for seizures and epilepsy, number of ASMs and epilepsy-related ICD codes, age, and sex. Data were randomly divided into training (70%) and hold-out testing (30%) sets, with distinct patients in each set. We trained regularized logistic regression and an extreme gradient boosting models. Model performance was measured using area under the receiver operating curve (AUROC) and area under the precision-recall curve (AUPRC), with 95% confidence intervals (CI) estimated via bootstrapping.

Results: Our study cohort included 3903 adults drawn from outpatient departments of nine hospitals between February 2015 and June 2022 (mean age = 47 ± 18 years, 57% women, 82% White, 84% non-Hispanic, 70% with epilepsy). The final models included 285 features, including 246 keywords and phrases captured from 8415 encounters. Both models achieved AUROC and AUPRC of 1 (95% CI = .99-1.00) in the hold-out testing set.

Significance: A machine learning-based AEP approach accurately identifies patients with epilepsy from notes, ICD codes, and ASMs. This model can enable large-scale epilepsy research using EHR databases.

Keywords: electronic medical records (EMR); neurology; text mining; unstructured text.

Publication types

  • Randomized Controlled Trial
  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Extramural

MeSH terms

  • Adult
  • Aged
  • Algorithms*
  • Electronic Health Records
  • Epilepsy* / diagnosis
  • Female
  • Humans
  • Machine Learning
  • Male
  • Middle Aged
  • Software