A vocabulary development and visualization tool based on natural language processing and the mining of textual patient reports

J Biomed Inform. 2003 Jun;36(3):189-201. doi: 10.1016/j.jbi.2003.08.005.

Abstract

Medical terminologies are critical for automated healthcare systems. Some terminologies, such as the UMLS and SNOMED are comprehensive, whereas others specialize in limited domains (i.e., BIRADS) or are developed for specific applications. An important feature of a terminology is comprehensive coverage of relevant clinical terms and ease of use by users, which include computerized applications. We have developed a method for facilitating vocabulary development and maintenance that is based on utilization of natural language processing to mine large collections of clinical reports in order to obtain information on terminology as expressed by physicians. Once the reports are processed and the terms structured and collected into an XML representational schema, it is possible to determine information about terms, such as frequency of occurrence, compositionality, relations to other terms (such as modifiers), and correspondence to a controlled vocabulary. This paper describes the method and discusses how it can be used as a tool to help vocabulary builders navigate through the terms physicians use, visualize their relations to other terms via a flexible viewer, and determine their correspondence to a controlled vocabulary.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.
  • Validation Study

MeSH terms

  • Algorithms*
  • Computer Graphics
  • Database Management Systems*
  • Hypermedia
  • Information Storage and Retrieval / methods*
  • Medical Records Systems, Computerized*
  • Natural Language Processing
  • Programming Languages
  • Terminology as Topic*
  • User-Computer Interface*
  • Vocabulary, Controlled*