SEMCARE: Multilingual Semantic Search in Semi-Structured Clinical Data

Stud Health Technol Inform. 2016:223:93-9.

Abstract

The vast amount of clinical data in electronic health records constitutes a great potential for secondary use. However, most of this content consists of unstructured or semi-structured texts, which is difficult to process. Several challenges are still pending: medical language idiosyncrasies in different natural languages, and the large variety of medical terminology systems. In this paper we present SEMCARE, a European initiative designed to minimize these problems by providing a multi-lingual platform (English, German, and Dutch) that allows users to express complex queries and obtain relevant search results from clinical texts. SEMCARE is based on a selection of adapted biomedical terminologies, together with Apache UIMA and Apache Solr as open source state-of-the-art natural language pipeline and indexing technologies. SEMCARE has been deployed and is currently being tested at three medical institutions in the UK, Austria, and the Netherlands, showing promising results in a cardiology use case.

MeSH terms

  • Data Mining / methods*
  • Electronic Health Records*
  • Humans
  • Information Storage and Retrieval / methods
  • Language
  • Linguistics / methods
  • Natural Language Processing
  • Semantics