An Evaluation of Patient Safety Event Report Categories Using Unsupervised Topic Modeling

Methods Inf Med. 2015;54(4):338-45. doi: 10.3414/ME15-01-0010. Epub 2015 Apr 2.

Abstract

Objective: Patient safety event data repositories have the potential to dramatically improve safety if analyzed and leveraged appropriately. These safety event reports often consist of both structured data, such as general event type categories, and unstructured data, such as free text descriptions of the event. Analyzing these data, particularly the rich free text narratives, can be challenging, especially with tens of thousands of reports. To overcome the resource intensive manual review process of the free text descriptions, we demonstrate the effectiveness of using an unsupervised natural language processing approach.

Methods: An unsupervised natural language processing technique, called topic modeling, was applied to a large repository of patient safety event data to identify topics, or themes, from the free text descriptions of the data. Entropy measures were used to evaluate and compare these topics to the general event type categories that were originally assigned by the event reporter.

Results: Measures of entropy demonstrated that some topics generated from the unsupervised modeling approach aligned with the clinical general event type categories that were originally selected by the individual entering the report. Importantly, several new latent topics emerged that were not originally identified. The new topics provide additional insights into the patient safety event data that would not otherwise easily be detected.

Conclusion: The topic modeling approach provides a method to identify topics or themes that may not be immediately apparent and has the potential to allow for automatic reclassification of events that are ambiguously classified by the event re- porter.

Keywords: Patient safety event reports; general event type; latent dirichlet allocation; natural language processing; topic model; unsupervised learning.

Publication types

  • Evaluation Study

MeSH terms

  • Humans
  • Information Storage and Retrieval / methods*
  • Information Storage and Retrieval / standards
  • Natural Language Processing*
  • Patient Safety*
  • Quality Improvement