Using multivariate long short-term memory neural network to detect aberrant signals in health data for quality assurance

Int J Med Inform. 2021 Mar:147:104368. doi: 10.1016/j.ijmedinf.2020.104368. Epub 2020 Dec 16.

Abstract

Background: The data quality of electronic health records (EHR) has been a topic of increasing interest to clinical and health services researchers. One indicator of possible errors in data is a large change in the frequency of observations in chronic illnesses. In this study, we built and demonstrated the utility of a stacked multivariate LSTM model to predict an acceptable range for the frequency of observations.

Methods: We applied the LSTM approach to a large EHR dataset with over 400 million total encounters. We computed sensitivity and specificity for predicting if the frequency of an observation in a given week is an aberrant signal.

Results: Compared with the simple frequency monitoring approach, our proposed multivariate LSTM approach increased the sensitivity of finding aberrant signals in 6 randomly selected diagnostic codes from 75 to 88% and the specificity from 68 to 91%. We also experimented with two different LSTM algorithms, namely, direct multi-step and recursive multi-step. Both models were able to detect the aberrant signals while the recursive multi-step algorithm performed better.

Conclusions: Simply monitoring the frequency trend, as is the common practice in systems that do monitor the data quality, would not be able to distinguish between the fluctuations caused by seasonal disease changes, seasonal patient visits, or a change in data sources. Our study demonstrated the ability of stacked multivariate LSTM models to recognize true data quality issues rather than fluctuations that are caused by different reasons, including seasonal changes and outbreaks.

Keywords: Electronic health records; Health data quality; LSTM models.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Electronic Health Records
  • Humans
  • Memory, Short-Term*
  • Neural Networks, Computer*