Automated screening of natural language in electronic health records for the diagnosis septic shock is feasible and outperforms an approach based on explicit administrative codes

J Crit Care. 2020 Apr:56:203-207. doi: 10.1016/j.jcrc.2020.01.007. Epub 2020 Jan 9.

Abstract

Purpose: Identification of patients for epidemiologic research through administrative coding has important limitations. We investigated the feasibility of a search based on natural language processing (NLP) on the text sections of electronic health records for identification of patients with septic shock.

Materials and methods: Results of an explicit search strategy (using explicit concept retrieval) and a combined search strategy (using both explicit and implicit concept retrieval) were compared to hospital ICD-9 based administrative coding and to our department's own prospectively compiled infection database.

Results: Of 8911 patients admitted to the medical or surgical ICU, 1023 (11.5%) suffered from septic shock according to the combined search strategy. This was significantly more than those identified by the explicit strategy (518, 5.8%), by hospital administrative coding (549, 5.8%) or by our own prospectively compiled database (609, 6.8%) (p < .001). Sensitivity and specificity of the automated combined search strategy were 72.7% (95%CI 69.0%-76.2%) and 93.0% (95%CI 92.4%-93.6%), compared to 56.0% (95%CI 52.0%-60.0%) and 97.5% (95%CI 97.1%-97.8%) for hospital administrative coding.

Conclusions: An automated search strategy based on a combination of explicit and implicit concept retrieval is feasible to screen electronic health records for septic shock and outperforms an administrative coding based explicit approach.

Keywords: Epidemiology; ICD codes; Natural language processing; Septic shock.

MeSH terms

  • Adult
  • Databases, Factual
  • Electronic Health Records*
  • Epidemiologic Studies
  • False Positive Reactions
  • Female
  • Hospitalization*
  • Humans
  • Incidence
  • Intensive Care Units
  • International Classification of Diseases
  • Middle Aged
  • Natural Language Processing*
  • Pattern Recognition, Automated
  • Predictive Value of Tests
  • Prospective Studies
  • Sensitivity and Specificity
  • Shock, Septic / diagnosis*
  • Shock, Septic / prevention & control*
  • Software
  • Young Adult