Enhancing HMM-based biomedical named entity recognition by studying special phenomena

J Biomed Inform. 2004 Dec;37(6):411-22. doi: 10.1016/j.jbi.2004.08.005.

Abstract

The purpose of this research is to enhance an HMM-based named entity recognizer in the biomedical domain. First, we analyze the characteristics of biomedical named entities. Then, we propose a rich set of features, including orthographic, morphological, part-of-speech, and semantic trigger features. All these features are integrated via a Hidden Markov Model with back-off modeling. Furthermore, we propose a method for biomedical abbreviation recognition and two methods for cascaded named entity recognition. Evaluation on the GENIA V3.02 and V1.1 shows that our system achieves 66.5 and 62.5 F-measure, respectively, and outperforms the previous best published system by 8.1 F-measure on the same experimental setting. The major contribution of this paper lies in its rich feature set specially designed for biomedical domain and the effective methods for abbreviation and cascaded named entity recognition. To our best knowledge, our system is the first one that copes with the cascaded phenomena.

MeSH terms

  • Abbreviations as Topic
  • Abstracting and Indexing / methods*
  • Algorithms
  • Animals
  • Artificial Intelligence
  • Biology / methods
  • Computational Biology / methods*
  • Database Management Systems
  • Databases as Topic
  • Databases, Bibliographic
  • Humans
  • Information Storage and Retrieval / methods*
  • Language
  • Markov Chains
  • Models, Statistical
  • Names
  • Natural Language Processing
  • Software
  • Terminology as Topic