A Framework of Rebalancing Imbalanced Healthcare Data for Rare Events' Classification: A Case of Look-Alike Sound-Alike Mix-Up Incident Detection

J Healthc Eng. 2018 May 22:2018:6275435. doi: 10.1155/2018/6275435. eCollection 2018.

Abstract

Identifying rare but significant healthcare events in massive unstructured datasets has become a common task in healthcare data analytics. However, imbalanced class distribution in many practical datasets greatly hampers the detection of rare events, as most classification methods implicitly assume an equal occurrence of classes and are designed to maximize the overall classification accuracy. In this study, we develop a framework for learning healthcare data with imbalanced distribution via incorporating different rebalancing strategies. The evaluation results showed that the developed framework can significantly improve the detection accuracy of medical incidents due to look-alike sound-alike (LASA) mix-ups. Specifically, logistic regression combined with the synthetic minority oversampling technique (SMOTE) produces the best detection results, with a significant 45.3% increase in recall (recall = 75.7%) compared with pure logistic regression (recall = 52.1%).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Data Accuracy
  • Data Collection
  • Databases, Factual
  • Electronic Health Records*
  • Humans
  • Logistic Models
  • Medical Errors / statistics & numerical data*
  • Medical Informatics / methods*
  • Models, Statistical
  • Programming Languages
  • Regression Analysis
  • Reproducibility of Results