Detecting abbreviations in discharge summaries using machine learning methods

AMIA Annu Symp Proc. 2011:2011:1541-9. Epub 2011 Oct 22.

Abstract

Recognition and identification of abbreviations is an important, challenging task in clinical natural language processing (NLP). A comprehensive lexical resource comprised of all common, useful clinical abbreviations would have great applicability. The authors present a corpus-based method to create a lexical resource of clinical abbreviations using machine-learning (ML) methods, and tested its ability to automatically detect abbreviations from hospital discharge summaries. Domain experts manually annotated abbreviations in seventy discharge summaries, which were randomly broken into a training set (40 documents) and a test set (30 documents). We implemented and evaluated several ML algorithms using the training set and a list of pre-defined features. The subsequent evaluation using the test set showed that the Random Forest classifier had the highest F-measure of 94.8% (precision 98.8% and recall of 91.2%). When a voting scheme was used to combine output from various ML classifiers, the system achieved the highest F-measure of 95.7%.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Abbreviations as Topic*
  • Algorithms*
  • Artificial Intelligence*
  • Decision Trees
  • Electronic Health Records*
  • Humans
  • Natural Language Processing*
  • Patient Discharge
  • Pattern Recognition, Automated
  • Support Vector Machine