Chinese Medical Named Entity Recognition Based on Context-Dependent Perception and Novel Memory Units

Kang, Yufeng; Yan, Yang; Huang, Wenbo

doi:10.3390/app14188471

Open AccessArticle

Chinese Medical Named Entity Recognition Based on Context-Dependent Perception and Novel Memory Units

by

Yufeng Kang

,

Yang Yan

^* and

Wenbo Huang

Institution of Computer Science and Technology, Changchun Normal University, Changchun 130032, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(18), 8471; https://doi.org/10.3390/app14188471

Submission received: 26 July 2024 / Revised: 12 September 2024 / Accepted: 19 September 2024 / Published: 20 September 2024

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Medical named entity recognition (NER) focuses on extracting and classifying key entities from medical texts. Through automated medical information extraction, NER can effectively improve the efficiency of electronic medical record analysis, medical literature retrieval, and intelligent medical question–answering systems, enabling doctors and researchers to obtain the required medical information more quickly and thereby helping to improve the accuracy of diagnosis and treatment decisions. The current methods have certain limitations in dealing with contextual dependencies and entity memory and fail to fully consider the contextual relevance and interactivity between entities. To address these issues, this paper proposes a Chinese medical named entity recognition model that combines contextual dependency perception and a new memory unit. The model combines the BERT pre-trained model with a new memory unit (GLMU) and a recall network (RMN). The GLMU can efficiently capture long-distance dependencies, while the RMN enhances multi-level semantic information processing. The model also incorporates fully connected layers (FC) and conditional random fields (CRF) to further optimize the performance of entity classification and sequence labeling. The experimental results show that the model achieved F1 values of 91.53% and 64.92% on the Chinese medical datasets MCSCSet and CMeEE, respectively, surpassing other related models and demonstrating significant advantages in the field of medical entity recognition.

Keywords:

named entity; medical; novel memory units; fully connected layer; multi-head attention

1. Introduction

Named entity recognition (NER) is a core task in natural language processing (NLP) aimed at identifying and classifying specific entities within text, including the names of people, places, and institutions [1]. Traditional medical NER methods depend on rule-based approaches and dictionary matching, which require manual formulation and maintenance of medical dictionaries, making them difficult to adapt to diverse and evolving texts. Deep learning approaches, including recurrent neural networks (RNNs [2]), long short-term memory networks (LSTMs [3]), convolutional neural networks (CNNs [4]), and pre-trained models such as BERT [5], have demonstrated strong performance in medical NER. However, these approaches often struggle to capture contextual dependencies, particularly when processing long sentences, polysemous words, and homographs. To address these challenges, we propose a Chinese medical NER method that integrates BERT, RNNs, an attention mechanism, and a novel memory unit. The model leverages BERT to capture contextual information, enhances entity and relationship understanding via RNNs and an attention mechanism, and introduces novel memory units to improve long-distance dependency capture. The primary innovations and contributions of this study are:

(1): New memory unit introduction: The model integrates novel memory units (GLMU and RMN) to enhance its capacity for capturing long-distance dependencies. These memory units more effectively preserve and utilize contextual information, particularly in the processing of lengthy texts and complex sentences, significantly improving NER accuracy.
(2): Fully connected layer optimization and application: The fully connected layer optimizes feature representation and classification performance in the model. By integrating and mapping high-dimensional features from earlier layers, it enhances classification accuracy, allowing for more precise identification of named entities.
(3): Complex named entity structure handling: The model effectively processes complex named entity structures. It accurately captures internal entity structures and relationships through the combined efforts of the memory units and fully connected layer, improving recognition accuracy and robustness.
(4): Enhanced recognition of new words and domain-specific terms: Through the integration of BERT, RNNs, attention mechanisms, novel memory units, and fully connected layers, the model’s ability to recognize new words and domain-specific terms has been markedly enhanced.

In summary, this study proposes a Chinese medical NER model integrating multiple advanced technologies, effectively improving performance in context-dependent perception, complex entity processing, and recognition of new words and domain-specific terms. The introduction and optimization of novel memory units and fully connected layers were particularly significant.

2. Related Work

This section focuses on reviewing existing research that is closely related to our study. Named entity recognition (NER) is one of the core tasks of natural language processing (NLP). To improve recognition accuracy, researchers have proposed a variety of methods, including rule-based, dictionary-based, and deep-learning-based methods.

2.1. Rule- and Dictionary-Based Approaches

The early named entity recognition methods mainly relied on the rules and dictionaries that were designed by combining the experience of experts in the medical field to create recognition rules and construct medical dictionaries covering entity words. Although this approach can achieve good results in some applications, it also has clear limitations. Firstly, entities in the medical field are diverse and constantly updated, making it difficult for rules and dictionaries to cover all of them. Secondly, with the continuous development of medical research, emerging medical entity words continue to appear, leading to gaps in existing dictionaries. Zhu et al. [6] combined medical dictionaries for medical entity extraction. Ke et al. [7] explored combining BiLSTM-CRF models with lexical resources. By introducing medical lexical resources, the model improves the recognition accuracy of medical terms using prior knowledge. Zhao et al. [8] performed the entity extraction of medical text through grid tagging and semantic segmentation. Li et al. [9] embedded multi-feature fusion to design class-specific NERs, which are better recognized but not scalable enough.

2.2. Statistical Machine Learning-Based Methods

With the development of machine learning technology, statistical machine learning methods have gradually become mainstream in named entity recognition. By learning entity features from annotated data, these methods reduce the need for handwritten rules and significantly lower labor costs. In the medical field, a large amount of annotated data has been accumulated, such as the CCKS2017 [10], N2C2 [11], CMeEE [12], and CHIP [13] disease corpora, which facilitate the application of statistical machine learning methods. Currently, the mainstream statistical machine learning methods include maximum entropy models (MEMs), support vector machines (SVMs), hidden Markov models (HMMs), and conditional random fields (CRFs). Among them, the CRF method is widely used as a sequence annotation model in named entity recognition. Yi et al. [14] proposed a fusion of lexical and stroke features for medical named entity recognition combining the BERT model, BiLSTM, and CRF. An et al. [15] proposed a BiLSTM-CRF model based on the self-attention mechanism that improves named entity recognition by introducing the BiLSTM layer in the attention mechanism to improve recognition accuracy.

2.3. Deep-Learning-Based Approaches

Deep learning techniques have made significant progress in the field of natural language processing, especially in named entity recognition tasks. Deep learning models perform end-to-end entity recognition without relying on lexicons and manual feature definitions, reducing reliance on manual and existing toolkits. These methods improve entity recognition accuracy by automatically learning features from data. Li et al. [16] proposed an end-to-end Chinese named entity recognition method based on BERT-BiLSTM-ATT-CRF, which combines BERT, BiLSTM, attention mechanism, and CRFs. Wang et al. [17] based their method on recurrent neural networks (RNNs), combining word coding for the specialized features of the medical field and achieving better results than statistical machine learning. Yang et al. [18] used an LSTM model based on the attention mechanism for named entity recognition. Zhong et al. [19] fused word and phrase information in training the BiLSTM model to efficiently deal with disease entities in Chinese electronic medical records. Tu et al. [20] used a CNN-BiLSTM model to identify medically relevant entities. Kong et al. [21] used a joint entity and relationship extraction method using dilated convolution and context fusion to achieve entity extraction from electronic medical records.

Despite the superiority demonstrated by deep learning methods in named entity recognition, challenges remain, particularly in context understanding and context-dependent entity recognition. To address these challenges, we propose a Chinese medical named entity recognition model that integrates multiple state-of-the-art techniques. This model leverages the BERT pre-training model to capture rich contextual information and combines a fully connected layer with a novel memory unit to enhance performance in context-dependent awareness, complex named entity processing, and recognition of neologisms and domain-specific terms.

3. Method

To enhance the model’s context awareness and focus on contextual associations and inter-entity interactions, we propose an innovative approach that combines the BERT pre-trained model as the main feature extraction layer with recurrent neural networks and novel memory units, along with a fully connected layer, to comprehensively improve the understanding of medical entities and their interrelationships. The overall structure is shown in Figure 1.

3.1. Medical Text Encoder

In the study of named entity recognition in the medical field, selecting an appropriate pre-trained model as an encoder is crucial for improving recognition accuracy. Given the specialized and complex nature of medical terms and expressions, this study adopts a BERT pre-trained model based on the Transformer architecture to better capture complex semantics and terminological features in medical texts.

Transformer encoders are the core of BERT. The Transformer model consists of multiple stacked encoder layers, each containing a self-attention mechanism and a feed-forward neural network. The self-attention mechanism allows the model to process each word while considering all other words in the sequence, capturing contextual relationships, and the feed-forward neural network further processes and transforms the output of the self-attention mechanism. BERT uses a bidirectional encoding approach, allowing it to understand context from both the left and right directions simultaneously, unlike traditional language models that model from only one direction (left-to-right or right-to-left). This bidirectional capability enables BERT to understand the contextual information of words more comprehensively. In medical named entity recognition (NER) tasks, BERT’s bidirectional encoding capability enhances its understanding of the meanings of medical terms in different contexts, which is crucial for accurately recognizing complex medical entities such as disease names, drug names, and symptoms. By considering both left and right contextual information, BERT captures more nuanced semantic relationships in medical texts. BERT excels in fine-grained feature extraction, capturing rich contextual features from input text, which enhances the recognition accuracy of medical entities, particularly with terminology and complex expressions. BERT’s contextual understanding significantly enhances its ability to handle long texts and effectively capture long-distance dependencies. In the medical literature, texts are often long and complex in structure, and BERT’s processing capability makes it particularly effective in handling long sentences and complex structures. To overcome the limitation of BERT input length, we used a segmentation processing method in our experiment to split long texts into multiple segments while maintaining context consistency between segments. We introduced a sliding window mechanism in the processing process to ensure that there is a certain overlap between the segments, thereby reducing the risk of important information being truncated.

To further improve BERT’s performance regarding medical texts, we adopt a systematic strategy. First, additional pre-training on a large-scale medical corpus enables BERT to learn terminology and linguistic patterns specific to the medical domain, enhancing its comprehension of medical texts. Second, expanding the model’s vocabulary is essential. Adding more medical terminology to the vocabulary significantly reduces the model’s unknown word (OOV, out of vocabulary) problem when processing medical terminology, thus improving recognition accuracy. Additionally, fine-tuning BERT’s attention mechanism is necessary. Optimizing the attention mechanism ensures the model focuses more on medical entities and their contextual information, enhancing its ability to accurately identify and categorize medical named entities such as diseases, drugs, and medical procedures.

To enhance the performance of the model in the medical named entity recognition task, BERT and BiLSTM are used as the core processing units in this study. BERT offers significant advantages in medical named entity recognition tasks. BiLSTM also exhibits unique advantages in medical named entity recognition tasks. The combination of BERT and BiLSTM leverages their complementary strengths to further enhance performance in medical named entity recognition tasks. When addressing the complexity of medical texts, the combination of BERT and BiLSTM demonstrates excellent capabilities. BERT’s contextual understanding ability and BiLSTM’s sequence modeling ability complement each other, enabling the model to adapt to the complex structure and multi-level information extraction needs in medical texts. BERT handles the contextual and semantic information of the text, while BiLSTM efficiently captures long-distance dependencies and sequential relationships within the text. This combination enables the model to perform entity recognition and classification more effectively when faced with the complexity and diversity of medical texts, significantly improving overall task performance. Taking the text sequence [“上”, “腹”, “痛”, “恶”, “心”, “是”, “什”, “么”, “原”, “因”] ([“upper”, “abdomen”, “pain”, “nausea”, “heart”, “is”, “what”, “what”, “origin”, “cause”]) as an example, we use the BIO tagging system to annotate these words with entities. In the BIO tagging system, “B” represents the beginning of an entity, “I” represents the inside of an entity, “E” represents the end of an entity, and “O” represents non-entity. In the above example, we represented two entities: “上腹痛” (“Epigastric pain”) and “恶心” (“nausea”) with the label sequence [B, I, E, B, E, O, O, O, O, O]. In this specific task, the primary role of the BiLSTM layer is to capture bidirectional contextual information in text sequences. Given an input sequence

X = (x_{1}, x_{2}, {\dots, x}_{n})

, BiLSTM processes the information using the following formula:

\begin{matrix} \vec{h_{t}} = LSTM (\vec{h_{t - 1}}, x_{t}), \overset{\leftarrow}{h_{t}} = LSTM (\overset{\leftarrow}{h_{t + 1}}, x_{t}) \\ h_{t} = [\vec{h_{t}}; \overset{\leftarrow}{h_{t}}] \end{matrix},

(1)

where

\vec{h_{t}}

denotes the forward LSTM. For each time step

t

in the sequence, the forward LSTM receives the input

x_{t}

of the current time step and the hidden state of the previous time step

\vec{h_{t - 1}}

and computes the hidden state of the current time step

\vec{h_{t - 1}}

.

\overset{\leftarrow}{h_{t}}

denotes the reverse LSTM. It operates in a similar manner but in the opposite direction. For each time step

t

, the reverse LSTM processes the input

x_{t}

of the current time step and the hidden state

\overset{\leftarrow}{h_{t + 1}}

of the subsequent time step to generate the hidden state

\overset{\leftarrow}{h_{t}}

of the current time step. Eventually, the hidden states from the forward and reverse LSTMs are concatenated to form

h_{t}

, which is the final hidden state for the current time step. This concatenation operation ensures that the hidden state at each time step contains information from both directions of the sequence. In the context of medical named entity recognition, this structure enables the BiLSTM to simultaneously consider contextual information both before and after medical terms, which is crucial for understanding the precise meaning of these terms and recognizing entity boundaries.

3.2. Medical Named Entity Encoder

The Chinese medical named entity encoder proposed in this paper is a multi-level structure that integrates recurrent neural networks (RNNs) and multi-head attention mechanisms to efficiently process medical text data at various levels, thereby improving the recognition and understanding of medical named entities and their complex relationships.

3.2.1. Character-Level Sequence Processing

To effectively recognize character-level linguistic features in medical texts, especially terms containing special characters or abbreviations, we employ recurrent neural networks (RNNs) to process characters sequentially while maintaining a dynamic hidden state. This approach enables the model to handle both character-level and phrase-level data, enhancing its understanding of the text’s context and linguistic structure. Given a sequence of characters

X = (x_{1}, x_{2}, {\dots, x}_{n})

, its formulaic representation is:

h_{t} = f (W_{h h} h_{t - 1} + W_{x h} x_{t} + b_{h}),

(2)

where

x_{t}

denotes the character input at time step

t

,

h_{t}

represents the corresponding hidden state,

W_{h h}

and

W_{x h}

are the weight matrices,

b_{h}

is the bias term, and

f

is the activation function. The RNN layer processes each character iteratively, retaining a hidden state to capture information from the sequence up to that point, thereby assisting the model in understanding language patterns at the character level.

3.2.2. Sentence-Level Relationship Processing

Sentence-level complex relationships are processed using the multi-head attention mechanism, which enables the model to focus on different segments of the input sequence through distinct attention heads. Each head can capture various relationships and features, thereby enhancing the model’s overall expressive power. By processing multiple attention heads in parallel, the multi-head attention mechanism allows the model to interpret the input sequence from multiple perspectives simultaneously. This diversity in attention enhances the model’s robustness across various types and styles of text. Even if one attention head fails to capture a specific type of information accurately, other attention heads can compensate for this shortfall, thereby enhancing the overall stability and reliability of the model. The core formula of the multi-head attention mechanism used in this paper is as follows:

MultiHead (Q, K, V) = C o n c a t ({h e a d}_{1}, {h e a d}_{2}, {\dots, h e a d}_{m}) W^{O},

(3)

{h e a d}_{i} = A t t e n t i o n (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V}),

(4)

Attention (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V,

(5)

where

Q, K, V

represent queries, keys, and values, respectively, which can be derived from the output h′ or other suitable representations of the phrase-level RNN;

W_{i}^{Q}, W_{i}^{K}, W_{i}^{V}

, and

W^{O}

are the learnable weight matrices in the model;

d_{k}

represents the dimensionality of the keys and is used for scaling the dot product; and

m

denotes the number of attention heads, enabling the model to capture information in parallel. This design allows the sentence-level complex relationship processing sub-module to effectively capture and analyze intricate relationships between entities in medical text, thereby supporting more accurate named entity recognition and relationship extraction.

3.3. Novel Memory Unit

To further enhance the performance of the Chinese medical named entity encoder, we introduce a novel memory unit, GLMU + RMN, into the model. The global long-range memory unit (GLMU) enhances the model’s contextual understanding by capturing long-range dependencies and is particularly effective for processing medical information with complex terms and lengthy texts. When combined with the recursive memory mechanism of RMN, the model can more accurately identify and classify medical named entities, thereby enhancing the precision of named entity recognition. GLMU + RMN effectively handles lengthy texts and complex sentences, ensuring that critical information is preserved and enhancing semantic understanding, which enables the model to more accurately interpret medical terminology and descriptions. By expanding the vocabulary and optimizing the memory mechanism, this combination reduces the impact of unknown words and effectively manages complex dependencies, significantly improving the model’s performance in medical text processing. The formula is expressed as follows:

GLMU aims to capture global dependencies in sequences:

h_{t} = σ (W_{h} \cdot [h_{t - 1}, x_{t}] + b_{h}),

(6)

c_{t} = f (h_{t}, c_{t - 1}),

(7)

where

h_{t}

denotes the hidden state at time step

t

,

x_{t}

represents the input,

W_{h}

is the weight matrix,

b_{h}

is the bias vector,

c_{t}

is the memory cell state, and

σ

is the activation function. GLMU saves long-term dependencies by introducing a memory unit

c_{t}

.

c_{t} = t a n h (W_{c} h_{t} + U_{c} c_{t - 1} + b_{c}),

(8)

This formula shows that the current memory state

c_{t}

is a nonlinear combination of the previous memory state

c_{t - 1}

and the current hidden state

h_{t}

, which can effectively capture long-distance dependencies.

The RMN utilizes a recursive mechanism to save and update memory cells:

m_{t} = ϕ (W_{m} \cdot [m_{t - 1}, h_{t}] + b_{m}),

(9)

where

m_{t}

denotes the memory state at time step

t

,

W_{m}

represents the weight matrix,

b_{m}

is the bias vector, and

ϕ

is the activation function.

The combination of GLMU and RMN can be expressed by the following equation:

{\tilde{h}}_{t} = σ (W_{\tilde{h}} \cdot [h_{t}, m_{t}] + b_{\tilde{h}}),

(10)

By incorporating novel memory units, the model can more flexibly and effectively process multi-level information in medical texts. This approach not only enhances the model’s understanding of the boundaries and internal structure of medical entities but also improves its ability to recognize complex medical terms and their relationships.

3.4. Full-Linkage Layer

To enhance the model’s ability to capture contextual information and improve Chinese named entity recognition performance, we introduce a full-linkage layer. The full-linkage layer nonlinearly combines the input features to extract higher-level, more abstract features and enhances the model’s decision making by weighting and transforming these features. This improvement helps the model better capture word relationships and contextual information, thereby improving its ability to recognize named entities. Its formula is expressed as follows:

y = W x + b,

(11)

where y is the output vector; W is the weight matrix with dimensions m × n, where m denotes the dimension of the output vector and n denotes the dimension of the input vector; x is the input vector; and b is the bias vector with dimension m. The fully connected layer is a crucial component in constructing a deep learning model, as it enhances the model’s feature extraction and representation capabilities through linear transformation and parameter optimization.

The output of the fully connected layer is processed by an activation function to introduce nonlinearity. In this study, the ReLU activation function is used, and the output is:

y = R e L U (W x + b),

(12)

The ReLU function maps negative input values to zero and retains positive values, thereby introducing nonlinearity.

3.5. Loss Function

Conditional random fields (CRFs) are commonly employed in the post-processing phase of sequence annotation to enhance the accuracy of label prediction in medical named entity recognition tasks. CRFs can account for label dependencies to optimize sequence annotation results. We incorporate the CRF loss to address the overall accuracy of the labeled sequences, with particular emphasis on label dependencies. The CRF loss can be expressed as follows:

L_{C R F} = - l o g P (y∣ x) .

(13)

where

P (y ∣ x)

represents the conditional probability of the label sequence

y

given the input sequence

x

, as computed by the CRF layer. By optimizing

P (y ∣ x)

, an optimal balance can be achieved between accurate recognition of individual entities and overall sequence correctness, thereby enhancing recognition performance.

4. Experiment and Result

4.1. Experimental Datasets

In this study, we utilize the publicly available Chinese medical text datasets MCSCSet [22] and CMeEE [12] to assess our named entity recognition model. The MCSCSet (Medical Chinese Sentence Classification Set) is a specialized dataset for medical text classification research, derived from a diverse range of sources, including medical books, theses, online medical Q&A platforms, and electronic medical records. These diverse sources ensure the richness and representativeness of the dataset, providing researchers with a high-quality annotated resource that effectively supports medical information extraction. Statistics for the MCSCSet dataset are presented in Table 1.

CMeEE (Chinese Medical Named Entity Recognition Dataset) is a dataset specifically designed for Chinese medical named entity recognition (CMNER) and was released by CHIP2020 (China Health Information Processing Conference). It includes text extracted from medical books, theses, electronic medical records, and online medical Q&A platforms. The texts are labeled with various medical entity categories, including diseases, symptoms, drugs, medical tests, treatments, and body parts. Training and evaluating models using this dataset can enhance the accuracy and efficiency of medical text processing. Statistics for the CMeEE dataset are presented in Table 2.

For data preprocessing, each sentence in the dataset is decomposed into individual characters and annotated based on whether the character is part of an entity. Annotation categories are represented as follows: “B” (beginning of entity), “I” (inside of entity), “E” (end of entity), and “O” (outside of entity). Each character is assigned a corresponding label, and the processed data are saved in JSON format for ease of subsequent use.

4.2. Parameterisation and Evaluation Criteria

The experimental setup of this paper is detailed as follows: the CPU utilized is an Intel (R) Xeon (R) 2.40 GHz, the GPU employed is an NVIDIA GeForce RTX4090 (24 G), Python version 3.8 and Pytorch version 1.14 are used, and the model’s hyperparameters are configured as outlined in Table 3.

During the model training process, a regularization method (Dropout technology) was used, which can prevent the model from over-relying on the noise features in the training data, thereby improving the generalization performance. Early Stopping was used to stop training in advance when the performance of the validation set no longer improved to prevent the model from overfitting on the training set. The model was cross validated to ensure that the model performed consistently on different datasets, thereby further verifying the stability and generalization ability of the model. To evaluate the model’s effectiveness, we use the precision (P), recall (R), and F1 score as metrics.

4.3. Experimental Setup

This study conducted extensive experiments on two datasets, converting text into the input format required by the BERT model, using the BERT pre-trained model as the basis, and then connecting a recurrent neural network (RNN) and an attention mechanism to capture deeper contextual information in the sequence. New memory units (GLMU and RMN) are introduced to enhance the ability to capture long-distance dependencies. Fully connected layers are used for feature integration and classification. This study uses a cross-validation method to ensure the reliability of the results. Each dataset is divided into five subsets, four of which are used for model training and one for validation. This process is repeated five times for each dataset, rotating the validation set and training set to cover all the data. This method improves the stability and reliability of the experimental results. In order to evaluate the effectiveness of the proposed model, we conducted comparative experiments with several pre-trained models as follows:

(1): BERT-base [23]: BERT is a pre-trained language model developed by the Google AI Language team based on Transformer architecture. It utilizes a bidirectional encoder to understand contextual relationships in text. BERT-base consists of 12 Transformer layers, each with 768 hidden units and 12 self-attention heads.
(2): BERT-wwm [24]: BERT-wwm, developed by the Xunfei Joint Laboratory of HIT University, is a variant of BERT that employs the whole-word masking strategy during pre-training. This approach masks entire words rather than just parts of words.
(3): MacBERT-base [25]: MacBERT, an improved version of BERT, incorporates several enhancements including a revised masking strategy and a denoising task to bolster the model’s robustness and generalization. MacBERT-base uses “word-level masking” and “sentence reconstruction” strategies to better learn contextual semantics during pre-training.
(4): RoBERTa [26]: RoBERTa, developed by Facebook AI, is an enhanced version of BERT that optimizes the pre-training process. Improvements include utilizing a larger training dataset, extending training duration, eliminating the next-sentence prediction (NSP) task, and dynamically varying the masking pattern.
(5): BERT-wwm-ext [27]: BERT-wwm-ext, developed by the Xunfei Joint Laboratory of HITC, is an extended version of BERT-wwm. It enhances BERT-wwm by utilizing a larger Chinese corpus for pre-training while continuing to apply the whole-word masking strategy.

We validate the effectiveness of our proposed Chinese medical named entity recognition model, which integrates context-dependent perception and novel memory units, by comparing it with existing methods. This comparison is conducted using the MCSCSet and CMeEE datasets. The experimental results are presented in Table 4 and Table 5. Our model, BERT-GLMU+RMN-FC-CRF, achieves optimal precision, recall, and F1 scores on the MCSCSet dataset under identical parameter settings, demonstrating its superior performance in named entity recognition.

The experimental results show that BERT-GLMU+RMN-FC-CRF performs exceptionally well on both datasets, achieving F1 scores of 91.53 and 64.92, significantly outperforming other models such as MacBERT-large, BERT-base-CRF, RoBERTa-CRF, BERT-Biaffine, Span-level NER, and PSA. This demonstrates that our proposed model offers substantial advantages in processing medical texts, particularly in recognition accuracy and recall. This advantage is attributed to innovations in the model structure, such as the integration of RNNs with a multi-head attention mechanism and the introduction of novel memory units and fully connected layers. The multi-head attention mechanism allows the model to better discern the importance of each entity and the complex relationships between them, which is especially effective in handling multi-dimensional entity relationships and rich semantic hierarchies in medical texts. Additionally, the novel memory units (e.g., GLMU+RMN) capture complex contextual relationships and multidimensional semantic information, while the fully connected layer further integrates this information to optimize entity classification performance. The integration of novel memory units with the fully connected layer significantly enhances the accuracy of entity recognition by improving the model’s capacity to retain context and integrate information. These results affirm the superior performance and technological advancements of BERT-GLMU+RMN-FC-CRF in medical named entity recognition tasks.

5. Discussion

In this study, we proposed a Chinese medical named entity recognition (NER) model that combines the BERT pre-trained model, RNNs, attention mechanism, novel memory units (GLMU and RMN), and fully connected layers. To investigate whether the enhanced performance of the BERT-GLMU+RMN-FC-CRF method in entity recognition is primarily due to the novel memory unit and fully connected layer, we conducted the following ablation experiments:

(1): In the first ablation experiment, we removed the fully connected layer, retaining only the BERT-GLMU+RMN model with the CRF layer, to compare the performance of the model without the fully connected layer to that of the full model.
(2): In the second ablation experiment, we retained the BERT-FC-CRF model and removed the novel memory unit (GLMU+RMN) to evaluate the contribution of the novel memory unit to overall model performance.
(3): In the third ablation experiment, we retained the BERT-CRF model and removed both the novel memory unit (GLMU+RMN) and the fully connected layer (FC) to evaluate their contributions to the overall model performance.

The results of the ablation experiments are shown in Figure 2, where the BERT-GLMU+RMN-FC-CRF model slightly outperforms the BERT-GLMU+RMN-CRF, BERT-FC-CRF, and BERT-CRF models across all three metrics. These results suggest that the improvement in model effectiveness relies on the proposed dependency perception and novel memory unit modules. This indicates that the novel memory unit (GLMU+RMN) and the fully connected layer (FC) in our model positively impact performance in medical named entity recognition tasks. The experimental results show that the model outperforms existing related models on Chinese medical datasets, demonstrating significant advantages in handling complex named entity structures and identifying domain-specific terms.

As shown in Figure 2, the model performs robustly in terms of precision even after the removal of the fully connected layer (FC), with a precision rate of 92.23%, compared to the original model BERT-GLMU+RMN-FC-CRF’s 92.28%. However, its recall decreases slightly to 90.42% compared to the original model’s 90.61%. This suggests that the absence of the fully connected layer reduces the model’s ability to detect entities. In terms of F1 score, BERT-GLMU+RMN-CRF achieves 91.26%, slightly lower than the 91.53% of the original model. This indicates that the model without the fully connected layer may not handle complex tasks, particularly entity detection, as effectively. When the novel memory unit (GLMU+RMN) is removed, the model’s precision drops to 92.23% and recall decreases to 89.36% compared to the original model’s 92.28% precision and 90.61% recall. This suggests that removing the novel memory unit significantly reduces the model’s ability to detect entities. In terms of F1 score, BERT-FC-CRF achieves 89.65%, which is lower than the original model’s 91.53%. This suggests that BERT-FC-CRF performs moderately and may not be as effective as models incorporating the novel memory unit in handling complex tasks, particularly in context-dependent perception. After removing both the novel memory unit (GLMU+RMN) and the fully connected layer (FC), the model’s precision drops to 90.22%, the recall to 87.41%, and the F1 score to 88.79%, all significantly lower than the original model’s performance (92.28%, 90.61%, and 91.53%, respectively). This indicates that BERT-CRF performs poorly overall. The combination of the novel memory unit and the fully connected layer significantly improves model performance in medical text processing by reducing the impact of unknown words and handling complex dependencies more effectively.

6. Conclusions

This paper designs and implements a BERT-GLMU + RMN-FC-CRF-based model for named entity recognition in Chinese medical texts. The model introduces a novel memory unit (GLMU) and a memory network (RMN) mechanism integrated with a BERT pre-trained model and a fully connected layer (FC). These innovative designs enable the model to make significant progress in handling complex contexts, long-distance dependencies, and specialized terms in medical texts. Key issues in named entity recognition in Chinese medical texts have been addressed, improving recognition accuracy and generalization through innovative structural designs. Our experimental results indicate that on the MCSCSet dataset, the model achieves an F1 score of 91.53%, precision of 92.28%, and recall of 90.1%. On the CMeEE dataset, it achieves an F1 score of 64.92%, precision of 66.24%, and recall of 64.93%. These results verify the model’s superiority in processing real-world medical data. The scientific significance of this study lies in effectively addressing the recognition challenges of complex named entity structures through innovative memory mechanisms, particularly in recognizing long-distance semantic dependencies and polysemous terms. The model flexibly adapts to various fields and new medical terminology, significantly enhancing its practical applications. It demonstrates notable advancements and effectiveness in Chinese medical named entity recognition, offering strong technical support for research and applications in related fields.

Author Contributions

Conceptualization, Y.K. and Y.Y.; methodology, Y.K.; software, W.H.; validation, Y.Y. and Y.K.; formal analysis, W.H.; investigation, W.H.; resources, W.H.; data curation, Y.K.; writing—original draft preparation, Y.K.; writing—review and editing, Y.Y.; visualization, Y.Y.; supervision, Y.Y.; project administration, W.H.; funding acquisition, W.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Jilin Provincial Science and Technology Development Plan “Research on Multi-Target Classification Techniques Incorporating Efficient Channel Attention Mechanisms.”, approval number YDZJ202401350ZYTS.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets presented in this article are not readily available because the data are part of an ongoing study. Requests to access the datasets should be directed to https://github.com/Nanfengguojing0403/MCSCSet-and-CMeEE.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Jehangir, B.; Radhakrishnan, S.; Agarwal, R. A survey on Named Entity Recognition—Datasets, tools, and methodologies. Nat. Lang. Process. J. 2023, 3, 100017. [Google Scholar] [CrossRef]
Soltau, H.; Shafran, I.; Wang, M.; El Shafey, L. RNN Transducers for Named Entity Recognition with constraints on alignment for understanding medical conversations. In Proceedings of the INTERSPEECH 2022, Incheon, Republic of Korea, 18–22 September 2022; pp. 1901–1905. [Google Scholar]
Cahuantzi, R.; Chen, X.; Güttel, S. A comparison of LSTM and GRU networks for learning symbolic sequences. In Science and Information Conference; Springer Nature: Cham, Switzerland, 2023; pp. 771–785. [Google Scholar]
Zhang, R.; Zhao, P.; Guo, W.; Wang, R.; Lu, W. Medical named entity recognition based on dilated convolutional neural network. Cogn. Robot. 2022, 2, 13–20. [Google Scholar] [CrossRef]
Lu, Y.; Chen, H.; Zhang, Y.; Peng, J.; Xiang, D.; Zhang, J. Research on entity relation extraction for Chinese medical text. Health Inform. J. 2024, 30, 14604582241274762. [Google Scholar] [CrossRef] [PubMed]
Zhu, Z.; Li, J.; Zhao, Q.; Akhtar, F. A dictionary-guided attention network for biomedical named entity recognition in Chinese electronic medical records. Expert Syst. Appl. 2023, 231, 120709. [Google Scholar] [CrossRef]
Ke, J.; Wang, W.; Chen, X.; Gou, J.; Gao, Y.; Jin, S. Medical entity recognition and knowledge map relationship analysis of Chinese EMRs based on improved BiLSTM-CRF. Comput. Electr. Eng. 2023, 108, 108709. [Google Scholar] [CrossRef]
Zhao, X.; Shi, Z.; Xiang, Y.; Ren, Y. Chinese Named Entity Recognition Based on Grid Tagging and Semantic Segmentation. In Proceedings of the 2023 IEEE 9th International Conference on Cloud Computing and Intelligent Systems (CCIS), Dali, China, 12–13 August 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 289–294. [Google Scholar]
Li, J.; Meng, K. MFE-NER: Multi-feature fusion embedding for Chinese named entity recognition. arXiv 2021, arXiv:2109.07877. [Google Scholar]
Zheng, D.; Zhang, H.; Yu, F. Named entity recognition of Chinese electronic medical records based on adversarial training and feature fusion. In Proceedings of the 2023 International Joint Conference on Robotics and Artificial Intelligence, Macao, China, 19–25 August 2023; pp. 175–179. [Google Scholar]
Modi, S.; Kasmiran, K.A.; Sharef, N.M.; Sharum, M.Y. Extracting adverse drug events from clinical Notes: A systematic review of approaches used. J. Biomed. Inform. 2024, 151, 104603. [Google Scholar] [CrossRef]
Zhang, N.; Chen, M.; Bi, Z.; Liang, X.; Li, L.; Shang, X.; Yin, K.; Tan, C.; Xu, J.; Huang, F.; et al. Cblue: A chinese biomedical language understanding evaluation benchmark. arXiv 2021, arXiv:2106.08087. [Google Scholar]
Zheng, H.; Guan, M.; Mei, Y.; Li, Y.; Wu, Y. Check for updates ECNU-LLM@ CHIP-PromptCBLUE: Prompt Optimization and In-Context Learning for Chinese Medical Tasks. In Proceedings of the Health Information Processing: Evaluation Track Papers: 9th China Conference, CHIP 2023, Hangzhou, China, 27–29 October 2023; Proceedings. Springer Nature: Berlin/Heidelberg, Germany, 2024; Volume 2080, p. 60. [Google Scholar]
Yi, F.; Liu, H.; Wang, Y.; Wu, S.; Sun, C.; Feng, P.; Zhang, J. Medical Named Entity Recognition Fusing Part-of-Speech and Stroke Features. Appl. Sci. 2023, 13, 8913. [Google Scholar] [CrossRef]
An, Y.; Xia, X.; Chen, X.; Wu, F.X.; Wang, J. Chinese clinical named entity recognition via multi-head self-attention based BiLSTM-CRF. Artif. Intell. Med. 2022, 127, 102282. [Google Scholar] [CrossRef]
Li, D.; Tu, Y.; Zhou, X.; Zhang, Y.; Ma, Z. End-to-end chinese entity recognition based on bert-bilstm-att-crf. ZTE Commun. 2022, 20, 27. [Google Scholar]
Wang, W.; Li, X.; Ren, H.; Gao, D.; Fang, A. Chinese clinical named entity recognition from electronic medical records based on multisemantic features by using robustly optimized bidirectional encoder representation from transformers pretraining approach whole word masking and convolutional neural networks: Model development and validation. JMIR Med. Inform. 2023, 11, e44597. [Google Scholar] [PubMed]
Yang, H.; Wang, L.; Yang, Y. Named Entity Recognition in Electronic Medical Records Incorporating Pre-trained and Multi-Head Attention. IAENG Int. J. Comput. Sci. 2024, 51, 401–408. [Google Scholar]
Zhong, J.; Xuan, Z.; Wang, K.; Cheng, Z. A BERT-Span model for Chinese named entity recognition in rehabilitation medicine. PeerJ Comput. Sci. 2023, 9, e1535. [Google Scholar] [CrossRef]
Tu, H.; Han, L.; Nenadic, G. Extraction of Medication and Temporal Relation from Clinical Text by Harnessing Different Deep Learning Models. arXiv 2023, arXiv:2310.02229. [Google Scholar]
Kong, W.; Xia, Y.; Yao, W.; Lu, T. A Joint Entity and Relation Extraction Approach Using Dilated Convolution and Context Fusion. In CCF International Conference on Natural Language Processing and Chinese Computing; Springer Nature: Cham, Switzerland, 2023; pp. 135–146. [Google Scholar]
Jiang, W.; Ye, Z.; Ou, Z.; Zhao, R.; Zheng, J.; Liu, Y.; Liu, B.; Li, S.; Yang, Y.; Zheng, Y. Mcscset: A specialist-annotated dataset for medical-domain Chinese spelling correction. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 4084–4088. [Google Scholar]
Koroteev, M.V. BERT: A review of applications in natural language processing and understanding. arXiv 2021, arXiv:2103.11943. [Google Scholar]
Zhang, Y.; Liao, X.; Chen, L.; Kang, H.; Cai, Y.; Wang, Q. Multi-BERT-wwm model based on probabilistic graph strategy for relation extraction. In Proceedings of the Health Information Science: 10th International Conference, HIS 2021, Melbourne, VIC, Australia, 25–28 October 2021; Proceedings 10. Springer International Publishing: Berlin/Heidelberg, Germany, 2021; pp. 95–103. [Google Scholar]
Cui, Y.; Che, W.; Liu, T.; Qin, B.; Yang, Z. Pre-training with whole word masking for chinese bert. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 3504–3514. [Google Scholar] [CrossRef]
Nemkul, K. Use of Bidirectional Encoder Representations from Transformers (BERT) and Robustly Optimized Bert Pretraining Approach (RoBERTa) for Nepali News Classification. Tribhuvan Univ. J. 2024, 39, 124–137. [Google Scholar] [CrossRef]
Wang, S.; Fei, C.; Zhang, M. BEDA: BERT-wwm-ext with Data Augmentation for Similarity Measurement and Difficulty Evaluation of Test Questions. In Proceedings of the 2023 5th International Academic Exchange Conference on Science and Technology Innovation (IAECST), Guangzhou, China, 8–10 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 6–10. [Google Scholar]
Gu, Y.; Qu, X.; Wang, Z.; Zheng, Y.; Huai, B.; Yuan, N.J. Delving deep into regularity: A simple but effective method for Chinese named entity recognition. arXiv 2022, arXiv:2204.05544. [Google Scholar]
Liu, N.; Hu, Q.; Xu, H.; Xu, X.; Chen, M. Med-BERT: A pretraining framework for medical records named entity recognition. IEEE Trans. Ind. Inform. 2021, 18, 5600–5608. [Google Scholar] [CrossRef]
Han, D.; Wang, Z.; Li, Y.; Zhang, J. Segmentation-aware relational graph convolutional network with multi-layer CRF for nested named entity recognition. Complex Intell. Syst. 2024, 1–13. [Google Scholar] [CrossRef]
Zhou, L.; Chen, Y.; Li, X.; Li, Y.; Li, N.; Wang, X.; Zhang, R. A New Adapter Tuning of Large Language Model for Chinese Medical Named Entity Recognition. Appl. Artif. Intell. 2024, 38, 2385268. [Google Scholar] [CrossRef]

Figure 1. Overview framework of our proposed model.

Figure 2. Comparison of ablation experiment results.

Table 1. Distribution statistics of entities in the MCSCSet dataset.

Sample Size	Training Sets	Validators	Test Sets	Medical Entities
196,496	157,194	19,652	19,650	81,020

Table 2. Distribution statistics of entities in CMeEE dataset.

Sample Size	Training Sets	Validators	Test Sets
23,000	15,000	5000	3000

Table 3. Parameter settings.

Hyperparameterization	Setpoint
Epochs	100
Batch size	32
Learning rate	5 × 10⁻⁵
Learning rate decay	0.01
Text length	512
BiLSTM dimensions	512

Table 4. Experimental results of the MCSCSet dataset.

Data Set	Model	Precision (%)	Recall (%)	F1 (%)
MCSCSet	BERT-base-crf	90.22	87.41	88.79
	BERT-wwm-crf	90.82	88.20	89.49
	MacBERT-base-crf	89.13	87.98	88.55
	RoBERTa-crf	92.10	90.52	91.30
	BERT-GLMU+RMN-FC-CRF (Ours)	92.28	90.61	91.53

Table 5. Experimental results of the CMeEE dataset.

Data Set	Model	Precision (%)	Recall (%)	F1 (%)
CMeEE	MacBERT-large [28]	-	-	62.40
	BERT-Biaffine [29]	64.17	61.29	62.29
	Span-level NER [30]	66.40	62.40	64.50
	PSA [31]	65.97	60.96	63.36
	BERT-GLMU+RMN-FC-CRF (Ours)	66.24	64.93	64.92

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kang, Y.; Yan, Y.; Huang, W. Chinese Medical Named Entity Recognition Based on Context-Dependent Perception and Novel Memory Units. Appl. Sci. 2024, 14, 8471. https://doi.org/10.3390/app14188471

AMA Style

Kang Y, Yan Y, Huang W. Chinese Medical Named Entity Recognition Based on Context-Dependent Perception and Novel Memory Units. Applied Sciences. 2024; 14(18):8471. https://doi.org/10.3390/app14188471

Chicago/Turabian Style

Kang, Yufeng, Yang Yan, and Wenbo Huang. 2024. "Chinese Medical Named Entity Recognition Based on Context-Dependent Perception and Novel Memory Units" Applied Sciences 14, no. 18: 8471. https://doi.org/10.3390/app14188471

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Chinese Medical Named Entity Recognition Based on Context-Dependent Perception and Novel Memory Units

Abstract

1. Introduction

2. Related Work

2.1. Rule- and Dictionary-Based Approaches

2.2. Statistical Machine Learning-Based Methods

2.3. Deep-Learning-Based Approaches

3. Method

3.1. Medical Text Encoder

3.2. Medical Named Entity Encoder

3.2.1. Character-Level Sequence Processing

3.2.2. Sentence-Level Relationship Processing

3.3. Novel Memory Unit

3.4. Full-Linkage Layer

3.5. Loss Function

4. Experiment and Result

4.1. Experimental Datasets

4.2. Parameterisation and Evaluation Criteria

4.3. Experimental Setup

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI