Exploring the Value of Pre-trained Language Models for Clinical Named Entity Recognition

Belkadi, Samuel; Han, Lifeng; Wu, Yuping; Nenadic, Goran

Computer Science > Computation and Language

arXiv:2210.12770 (cs)

[Submitted on 23 Oct 2022 (v1), last revised 30 Oct 2023 (this version, v4)]

Title:Exploring the Value of Pre-trained Language Models for Clinical Named Entity Recognition

Authors:Samuel Belkadi, Lifeng Han, Yuping Wu, Goran Nenadic

View PDF

Abstract:The practice of fine-tuning Pre-trained Language Models (PLMs) from general or domain-specific data to a specific task with limited resources, has gained popularity within the field of natural language processing (NLP). In this work, we re-visit this assumption and carry out an investigation in clinical NLP, specifically Named Entity Recognition on drugs and their related attributes. We compare Transformer models that are trained from scratch to fine-tuned BERT-based LLMs namely BERT, BioBERT, and ClinicalBERT. Furthermore, we examine the impact of an additional CRF layer on such models to encourage contextual learning. We use n2c2-2018 shared task data for model development and evaluations. The experimental outcomes show that 1) CRF layers improved all language models; 2) referring to BIO-strict span level evaluation using macro-average F1 score, although the fine-tuned LLMs achieved 0.83+ scores, the TransformerCRF model trained from scratch achieved 0.78+, demonstrating comparable performances with much lower cost - e.g. with 39.80\% less training parameters; 3) referring to BIO-strict span-level evaluation using weighted-average F1 score, ClinicalBERT-CRF, BERT-CRF, and TransformerCRF exhibited lower score differences, with 97.59\%/97.44\%/96.84\% respectively. 4) applying efficient training by down-sampling for better data distribution further reduced the training cost and need for data, while maintaining similar scores - i.e. around 0.02 points lower compared to using the full dataset. Our models will be hosted at \url{this https URL}

Comments:	working paper - Large Language Models, Fine-tuning LLMs, Clinical NLP, Medication Mining, AI for Healthcare
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2210.12770 [cs.CL]
	(or arXiv:2210.12770v4 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.12770

Submission history

From: Lifeng Han Dr [view email]
[v1] Sun, 23 Oct 2022 16:27:31 UTC (403 KB)
[v2] Mon, 31 Oct 2022 15:39:41 UTC (406 KB)
[v3] Sat, 21 Oct 2023 19:26:46 UTC (3,275 KB)
[v4] Mon, 30 Oct 2023 17:56:49 UTC (4,120 KB)

Computer Science > Computation and Language

Title:Exploring the Value of Pre-trained Language Models for Clinical Named Entity Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Exploring the Value of Pre-trained Language Models for Clinical Named Entity Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators