Text retrieval based on medical subwords

Stud Health Technol Inform. 2002:90:241-5.

Abstract

In biomedical documents, there is ample evidence for complex morphological structures in specialized terms. While inflection is relatively easy to deal with, productive morphological processes such as derivation and single-word composition constitute a major challenge. Considering the problem from an information retrieval perspective, we split morphologically complex words into biomedically significant, morpheme-like subwords and match subwords the query terms and document terms are composed of. This way, morphologically motivated word form alterations can be eliminated from the retrieval procedure. Based on a series of retrieval experiments, we have gathered evidence that subword-based indexing and retrieval for the German biomedical sublanguage, at least--outperforms conventional string matching approaches.

MeSH terms

  • Germany
  • Information Storage and Retrieval*
  • Medical Informatics*
  • Terminology as Topic*