The impact of learning Unified Medical Language System knowledge embeddings in relation extraction from biomedical texts

Maxwell A Weinzierl; Ramon Maldonado; Sanda M Harabagiu

doi:10.1093/jamia/ocaa205

The impact of learning Unified Medical Language System knowledge embeddings in relation extraction from biomedical texts

J Am Med Inform Assoc. 2020 Oct 1;27(10):1556-1567. doi: 10.1093/jamia/ocaa205.

Authors

Maxwell A Weinzierl¹, Ramon Maldonado¹, Sanda M Harabagiu¹

Affiliation

¹ Human Language Technology Research Institute, Department of Computer Science, Erik Jonsson School of Engineering & Computer Science, University of Texas at Dallas, Richardson, Texas, USA.

Abstract

Objective: We explored how knowledge embeddings (KEs) learned from the Unified Medical Language System (UMLS) Metathesaurus impact the quality of relation extraction on 2 diverse sets of biomedical texts.

Materials and methods: Two forms of KEs were learned for concepts and relation types from the UMLS Metathesaurus, namely lexicalized knowledge embeddings (LKEs) and unlexicalized KEs. A knowledge embedding encoder (KEE) enabled learning either LKEs or unlexicalized KEs as well as neural models capable of producing LKEs for mentions of biomedical concepts in texts and relation types that are not encoded in the UMLS Metathesaurus. This allowed us to design the relation extraction with knowledge embeddings (REKE) system, which incorporates either LKEs or unlexicalized KEs produced for relation types of interest and their arguments.

Results: The incorporation of either LKEs or unlexicalized KE in REKE advances the state of the art in relation extraction on 2 relation extraction datasets: the 2010 i2b2/VA dataset and the 2013 Drug-Drug Interaction Extraction Challenge corpus. Moreover, the impact of LKEs is superior, achieving F1 scores of 78.2 and 82.0, respectively.

Discussion: REKE not only highlights the importance of incorporating knowledge encoded in the UMLS Metathesaurus in a novel way, through 2 possible forms of KEs, but it also showcases the subtleties of incorporating KEs in relation extraction systems.

Conclusions: Incorporating LKEs informed by the UMLS Metathesaurus in a relation extraction system operating on biomedical texts shows significant promise. We present the REKE system, which establishes new state-of-the-art results for relation extraction on 2 datasets when using LKEs.

Keywords: deep learning; information extraction; medical informatics; unified medical language system.

MeSH terms

Deep Learning
Information Storage and Retrieval / methods*
Knowledge Bases*
Unified Medical Language System*