Validation of an alcohol misuse classifier in hospitalized patients

Alcohol. 2020 May:84:49-55. doi: 10.1016/j.alcohol.2019.09.008. Epub 2019 Sep 28.

Abstract

Background: Current modes of identifying alcohol misuse in hospitalized patients rely on self-report questionnaires and diagnostic codes that have limitations, including low sensitivity. Information in the clinical notes of the electronic health record (EHR) may further augment the identification of alcohol misuse. Natural language processing (NLP) with supervised machine learning has been successful at analyzing clinical notes and identifying cases of alcohol misuse in trauma patients.

Methods: An alcohol misuse NLP classifier, previously developed on trauma patients who completed the Alcohol Use Disorders Identification Test, was validated in a cohort of 1000 hospitalized patients at a large, tertiary health system between January 1, 2007 and September 1, 2017. The clinical notes were processed using the clinical Text Analysis and Knowledge Extraction System. The National Institute on Alcohol Abuse and Alcoholism (NIAAA) guidelines for alcohol misuse were used during annotation of the medical records in our validation dataset.

Results: The alcohol misuse classifier had an area under the receiver operating characteristic curve of 0.91 (95% CI 0.90-0.93) in the cohort of hospitalized patients. The sensitivity, specificity, positive predictive value, and negative predictive value were 0.88 (95% CI 0.85-0.90), 0.78 (95% CI 0.74-0.82), 0.85 (95% CI 0.82-0.87), and 0.82 (95% CI 0.78-0.86), respectively. The Hosmer-Lemeshow Test (p = 0.13) demonstrates good model fit. Additionally, there was a dose-dependent response in alcohol consumption behaviors across increasing strata of predicted probabilities for alcohol misuse.

Conclusion: The alcohol misuse NLP classifier had good discrimination and test characteristics in hospitalized patients. An approach using the clinical notes with NLP and supervised machine learning may better identify alcohol misuse cases than conventional methods solely relying on billing diagnostic codes.

Keywords: alcohol use disorder; ethanol; machine learning; natural language processing; predictive value of tests.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, N.I.H., Intramural
  • Validation Study

MeSH terms

  • Adult
  • Alcoholism / diagnosis*
  • Case-Control Studies
  • Electronic Health Records*
  • Female
  • Humans
  • Inpatients*
  • Male
  • Middle Aged
  • Natural Language Processing*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Supervised Machine Learning*
  • Tertiary Care Centers