Purpose: The purpose of this study was to investigate whether aspirin use can be captured from the clinical notes in a nonvalvular atrial fibrillation population.
Methods: A total of 29,507 patients with newly diagnosed nonvalvular atrial fibrillation were identified from January 1, 2006, through December 31, 2011, and were followed up through December 31, 2012. More than 3 million clinical notes were retrieved from electronic medical records. A training data set of 2949 notes was created to develop a computer-based method to automatically extract aspirin use status and dosage information using natural language processing (NLP). A gold standard data set of 5339 notes was created using a blinded manual review. NLP results were validated against the gold standard data set. The aspirin data from the structured medication databases were also compared with the results from NLP. Positive and negative predictive values, along with sensitivity and specificity, were calculated.
Findings: NLP achieved 95.5% sensitivity and 98.9% specificity when compared with the gold standard data set. The positive predictive value was 93.0%, and the negative predictive value was 99.3%. NLP identified aspirin use for 83.8% of the study population, and 70% of the low dose aspirin use was identified only by the NLP method.
Implications: We developed and validated an NLP method specifically designed to identify low dose aspirin use status from the clinical notes with high accuracy. This method can be a valuable tool to supplement existing structured medication data.
Keywords: aspirin; electronic clinical notes; electronic medical record; integrated health systems; medication status; natural language processing.
Copyright © 2015 Elsevier HS Journals, Inc. All rights reserved.