Parenthetically speaking: classifying the contents of parentheses for text mining

AMIA Annu Symp Proc. 2011:2011:267-72. Epub 2011 Oct 22.

Abstract

The contents of parentheses in biomedical text have many potential uses in text mining applications. However, making use of them requires the ability to determine what class of contents they are. A system that automatically classifies parenthesized text into one of 20 categories is presented and evaluated here. It performs at a micro-averaged accuracy of 68% and a macro-averaged accuracy of 60% on an annotated corpus. The application is available as a Java class and as a Perl module.

MeSH terms

  • Data Mining*
  • Natural Language Processing*
  • Periodicals as Topic*
  • Software