Information extraction from Italian medical reports: An ontology-driven approach

Int J Med Inform. 2018 Mar:111:140-148. doi: 10.1016/j.ijmedinf.2017.12.013. Epub 2017 Dec 23.

Abstract

Objective: In this work, we propose an ontology-driven approach to identify events and their attributes from episodes of care included in medical reports written in Italian. For this language, shared resources for clinical information extraction are not easily accessible.

Materials and methods: The corpus considered in this work includes 5432 non-annotated medical reports belonging to patients with rare arrhythmias. To guide the information extraction process, we built a domain-specific ontology that includes the events and the attributes to be extracted, with related regular expressions. The ontology and the annotation system were constructed on a development set, while the performance was evaluated on an independent test set. As a gold standard, we considered a manually curated hospital database named TRIAD, which stores most of the information written in reports.

Results: The proposed approach performs well on the considered Italian medical corpus, with a percentage of correct annotations above 90% for most considered clinical events. We also assessed the possibility to adapt the system to the analysis of another language (i.e., English), with promising results.

Discussion and conclusion: Our annotation system relies on a domain ontology to extract and link information in clinical text. We developed an ontology that can be easily enriched and translated, and the system performs well on the considered task. In the future, it could be successfully used to automatically populate the TRIAD database.

Keywords: Information extraction; Natural language processing.

MeSH terms

  • Databases, Factual
  • Documentation / methods*
  • Humans
  • Information Storage and Retrieval*
  • Italy
  • Medical Record Linkage / methods*
  • Medical Records*
  • Natural Language Processing*