Medication Extraction from Electronic Clinical Notes in an Integrated Health System: A Study on Aspirin Use in Patients with Nonvalvular Atrial Fibrillation

Clin Ther. 2015 Sep;37(9):2048-2058.e2. doi: 10.1016/j.clinthera.2015.07.002. Epub 2015 Jul 29.

Abstract

Purpose: The purpose of this study was to investigate whether aspirin use can be captured from the clinical notes in a nonvalvular atrial fibrillation population.

Methods: A total of 29,507 patients with newly diagnosed nonvalvular atrial fibrillation were identified from January 1, 2006, through December 31, 2011, and were followed up through December 31, 2012. More than 3 million clinical notes were retrieved from electronic medical records. A training data set of 2949 notes was created to develop a computer-based method to automatically extract aspirin use status and dosage information using natural language processing (NLP). A gold standard data set of 5339 notes was created using a blinded manual review. NLP results were validated against the gold standard data set. The aspirin data from the structured medication databases were also compared with the results from NLP. Positive and negative predictive values, along with sensitivity and specificity, were calculated.

Findings: NLP achieved 95.5% sensitivity and 98.9% specificity when compared with the gold standard data set. The positive predictive value was 93.0%, and the negative predictive value was 99.3%. NLP identified aspirin use for 83.8% of the study population, and 70% of the low dose aspirin use was identified only by the NLP method.

Implications: We developed and validated an NLP method specifically designed to identify low dose aspirin use status from the clinical notes with high accuracy. This method can be a valuable tool to supplement existing structured medication data.

Keywords: aspirin; electronic clinical notes; electronic medical record; integrated health systems; medication status; natural language processing.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms
  • Aspirin / therapeutic use*
  • Atrial Fibrillation / drug therapy
  • California
  • Delivery of Health Care, Integrated
  • Electronic Health Records*
  • Humans
  • Information Storage and Retrieval / methods
  • Natural Language Processing*
  • Platelet Aggregation Inhibitors / therapeutic use*
  • Reproducibility of Results
  • Sensitivity and Specificity

Substances

  • Platelet Aggregation Inhibitors
  • Aspirin