A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction

Qi Li; Haijun Zhai; Louise Deleger; Todd Lingren; Megan Kaiser; Laura Stoutenborough; Imre Solti

doi:10.1136/amiajnl-2012-001487

A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):915-21. doi: 10.1136/amiajnl-2012-001487. Epub 2012 Dec 25.

Authors

Qi Li¹, Haijun Zhai, Louise Deleger, Todd Lingren, Megan Kaiser, Laura Stoutenborough, Imre Solti

Affiliation

¹ Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.

Abstract

Objective: The goal of this work was to evaluate machine learning methods, binary classification and sequence labeling, for medication-attribute linkage detection in two clinical corpora.

Data and methods: We double annotated 3000 clinical trial announcements (CTA) and 1655 clinical notes (CN) for medication named entities and their attributes. A binary support vector machine (SVM) classification method with parsimonious feature sets, and a conditional random fields (CRF)-based multi-layered sequence labeling (MLSL) model were proposed to identify the linkages between the entities and their corresponding attributes. We evaluated the system's performance against the human-generated gold standard.

Results: The experiments showed that the two machine learning approaches performed statistically significantly better than the baseline rule-based approach. The binary SVM classification achieved 0.94 F-measure with individual tokens as features. The SVM model trained on a parsimonious feature set achieved 0.81 F-measure for CN and 0.87 for CTA. The CRF MLSL method achieved 0.80 F-measure on both corpora.

Discussion and conclusions: We compared the novel MLSL method with a binary classification and a rule-based method. The MLSL method performed statistically significantly better than the rule-based method. However, the SVM-based binary classification method was statistically significantly better than the MLSL method for both the CTA and CN corpora. Using parsimonious feature sets both the SVM-based binary classification and CRF-based MLSL methods achieved high performance in detecting medication name and attribute linkages in CTA and CN.

Keywords: attribute linkages; clinical notes; clinical trial announcements; multi-layered sequence labeling; natural language processing.

Publication types

Evaluation Study
Research Support, N.I.H., Extramural

MeSH terms

Artificial Intelligence*
Clinical Trials as Topic
Humans
Information Storage and Retrieval / methods*
Medical Records*
Pharmaceutical Preparations*
Support Vector Machine*

Substances

Pharmaceutical Preparations

Abstract

Publication types

MeSH terms

Substances

Grants and funding