A model for indexing medical documents combining statistical and symbolic knowledge

AMIA Annu Symp Proc. 2007 Oct 11:2007:31-5.

Abstract

Objectives: To develop and evaluate an information processing method based on terminologies, in order to index medical documents in any given documentary context.

Methods: We designed a model using both symbolic general knowledge extracted from the Unified Medical Language System (UMLS) and statistical knowledge extracted from a domain of application. Using statistical knowledge allowed us to contextualize the general knowledge for every particular situation. For each document studied, the extracted terms are ranked to highlight the most significant ones. The model was tested on a set of 17,079 French standardized discharge summaries (SDSs).

Results: The most important ICD-10 term of each SDS was ranked 1st or 2nd by the method in nearly 90% of the cases.

Conclusions: The use of several terminologies leads to more precise indexing. The improvement achieved in the models implementation performances as a result of using semantic relationships is encouraging.

Publication types

  • Evaluation Study

MeSH terms

  • Abstracting and Indexing / methods*
  • Humans
  • Information Storage and Retrieval
  • International Classification of Diseases
  • Medical Records
  • Natural Language Processing*
  • Patient Discharge
  • Statistics as Topic
  • Unified Medical Language System
  • Vocabulary, Controlled*