Identifying health information technology related safety event reports from patient safety event report databases

J Biomed Inform. 2018 Oct:86:135-142. doi: 10.1016/j.jbi.2018.09.007. Epub 2018 Sep 10.

Abstract

Objective: The objective of this paper was to identify health information technology (HIT) related events from patient safety event (PSE) report free-text descriptions. A difference-based scoring approach was used to prioritize and select model features. A feature-constraint model was developed and evaluated to support the analysis of PSE reports.

Methods: 5287 PSE reports manually coded as likely or unlikely related to HIT were used to train unigram, bigram, and combined unigram-bigram logistic regression and support vector machine models using five-fold cross validation. A difference-based scoring approach was used to prioritize and select unigram and bigram features by their relative importance to likely and unlikely HIT reports. A held-out set of 2000 manually coded reports were used for testing.

Results: Unigram models tended to perform better than bigram and combined models. A 300-unigram logistic regression had comparable classification performance to a 4030-unigram SVM model but with a faster relative run-time. The 300-unigram logistic regression model evaluated with the testing data had an AUC of 0.931 and a F1-score of 0.765.

Discussion: A difference-based scoring, prioritization, and feature selection approach can be used to generate simplified models with high performance. A feature-constraint model may be more easily shared across healthcare organizations seeking to analyze their respective datasets and customized for local variations in PSE reporting practices.

Conclusion: The feature-constraint model provides a method to identify HIT-related patient safety hazards using a method that is applicable across healthcare systems with variability in their PSE report structures.

Keywords: Health information technology; Incident reports; Machine learning; Patient safety event reports; Text classification.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Adverse Drug Reaction Reporting Systems
  • Algorithms
  • Area Under Curve
  • Data Collection*
  • Data Mining
  • Databases, Factual
  • Humans
  • Medical Informatics / methods*
  • Models, Statistical
  • Patient Safety*
  • Pennsylvania
  • Regression Analysis
  • Research Report
  • Support Vector Machine*