Background and objectives: The importance of clinical natural language processing (NLP) has increased with the adoption of electronic health records (EHRs). One of the critical tasks in clinical NLP is named entity recognition (NER). Clinical NER in the Serbian language is a severely under-researched area. The few approaches that have been proposed so far are based on rules or machine-learning models with hand-crafted features, while current state-of-the-art models have not been explored. The objective of this paper is to assess the performance of state-of-the-art NER methods on clinical narratives in the Serbian language.
Materials and methods: We designed an experimental setup for a comprehensive evaluation of state-of-the-art NER models. The gold standard corpus we used for the evaluation is comprised of discharge summaries from the Clinic for Nephrology at the University Clinical Center of Serbia. The following models were evaluated: conditional random fields (CRF), multilingual transformers (BERT Multilingual and XLM RoBERTa), and long short-term memory (LSTM) recurrent neural networks, and their ensembles. In addition, we investigated the necessity of the pretraining task of transformer based models and the use of pretrained word embeddings with LSTM model.
Results: Our results show that individually CRF had the best precision, the pretrained BERT Multilingual model had the best recall values, and the LSTM model had the best F1 score. The best performance was achieved by combining the existing models in a majority voting ensemble with an F1 score of 0.892. The presented results are similar to the inter annotator agreement on our gold standard corpus and are comparable to existing state-of-the-art results for clinical NER reported in literature.
Conclusion: Existing state-of-the-art models can provide viable results for clinical named entity recognition when applied to languages with the complexity of the Serbian language without major modifications.
Keywords: BERT; Clinical named entity recognition; Electronic health records; Serbian language; Transformers.
Copyright © 2022 Elsevier B.V. All rights reserved.