Extracting similar terms from multiple EMR-based semantic embeddings to support chart reviews

J Biomed Inform. 2018 Jul:83:63-72. doi: 10.1016/j.jbi.2018.05.014. Epub 2018 May 22.

Abstract

Objective: Word embeddings project semantically similar terms into nearby points in a vector space. When trained on clinical text, these embeddings can be leveraged to improve keyword search and text highlighting. In this paper, we present methods to refine the selection process of similar terms from multiple EMR-based word embeddings, and evaluate their performance quantitatively and qualitatively across multiple chart review tasks.

Materials and methods: Word embeddings were trained on each clinical note type in an EMR. These embeddings were then combined, weighted, and truncated to select a refined set of similar terms to be used in keyword search and text highlighting. To evaluate their quality, we measured the similar terms' information retrieval (IR) performance using precision-at-K (P@5, P@10). Additionally a user study evaluated users' search term preferences, while a timing study measured the time to answer a question from a clinical chart.

Results: The refined terms outperformed the baseline method's information retrieval performance (e.g., increasing the average P@5 from 0.48 to 0.60). Additionally, the refined terms were preferred by most users, and reduced the average time to answer a question.

Conclusions: Clinical information can be more quickly retrieved and synthesized when using semantically similar term from multiple embeddings.

Keywords: Clinical similar terms; Electronic medical records (EMR); Highlighting; Query expansion; Search engines; Semantic embeddings.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Electronic Health Records*
  • Information Storage and Retrieval / methods*
  • Medical Informatics*
  • Semantics*