Development and Validation of an Algorithm to Identify Nonalcoholic Fatty Liver Disease in the Electronic Medical Record

Dig Dis Sci. 2016 Mar;61(3):913-9. doi: 10.1007/s10620-015-3952-x. Epub 2015 Nov 4.

Abstract

Background and aims: Nonalcoholic fatty liver disease (NAFLD) is the most common cause of chronic liver disease worldwide. Risk factors for NAFLD disease progression and liver-related outcomes remain incompletely understood due to the lack of computational identification methods. The present study sought to design a classification algorithm for NAFLD within the electronic medical record (EMR) for the development of large-scale longitudinal cohorts.

Methods: We implemented feature selection using logistic regression with adaptive LASSO. A training set of 620 patients was randomly selected from the Research Patient Data Registry at Partners Healthcare. To assess a true diagnosis for NAFLD we performed chart reviews and considered either a documentation of a biopsy or a clinical diagnosis of NAFLD. We included in our model variables laboratory measurements, diagnosis codes, and concepts extracted from medical notes. Variables with P < 0.05 were included in the multivariable analysis.

Results: The NAFLD classification algorithm included number of natural language mentions of NAFLD in the EMR, lifetime number of ICD-9 codes for NAFLD, and triglyceride level. This classification algorithm was superior to an algorithm using ICD-9 data alone with AUC of 0.85 versus 0.75 (P < 0.0001) and leads to the creation of a new independent cohort of 8458 individuals with a high probability for NAFLD.

Conclusions: The NAFLD classification algorithm is superior to ICD-9 billing data alone. This approach is simple to develop, deploy, and can be applied across different institutions to create EMR-based cohorts of individuals with NAFLD.

Keywords: Electronic medical records; Nonalcoholic fatty liver disease; Nonalcoholic steatohepatitis; Triglycerides.

Publication types

  • Research Support, N.I.H., Extramural
  • Validation Study

MeSH terms

  • Adult
  • Aged
  • Alanine Transaminase / blood
  • Algorithms*
  • Aspartate Aminotransferases / blood
  • Biopsy
  • Cohort Studies
  • Data Collection
  • Diabetes Mellitus / epidemiology
  • Electronic Health Records*
  • Female
  • Humans
  • International Classification of Diseases
  • Logistic Models
  • Male
  • Middle Aged
  • Natural Language Processing*
  • Non-alcoholic Fatty Liver Disease* / blood
  • Non-alcoholic Fatty Liver Disease* / epidemiology
  • Prevalence
  • Triglycerides / blood
  • United States / epidemiology

Substances

  • Triglycerides
  • Aspartate Aminotransferases
  • Alanine Transaminase