FLCMC: Federated Learning Approach for Chinese Medicinal Text Classification

Guang Hu; Xin Fang

doi:10.3390/e26100871

FLCMC: Federated Learning Approach for Chinese Medicinal Text Classification

Entropy (Basel). 2024 Oct 17;26(10):871. doi: 10.3390/e26100871.

Authors

Guang Hu^{1

2

3}, Xin Fang¹

Affiliations

¹ School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai 201620, China.
² School of Computer Science, Fudan University, Shanghai 200438, China.
³ Shanghai Key Laboratory of Data Science, Shanghai 200438, China.

Abstract

Addressing the privacy protection and data sharing issues in Chinese medical texts, this paper introduces a federated learning approach named FLCMC for Chinese medical text classification. The paper first discusses the data heterogeneity issue in federated language modeling. Then, it proposes two perturbed federated learning algorithms, FedPA and FedPAP, based on the self-attention mechanism. In these algorithms, the self-attention mechanism is incorporated within the model aggregation module, while a perturbation term, which measures the differences between the client and the server, is added to the local update module along with a customized PAdam optimizer. Secondly, to enable a fair comparison of algorithms' performance, existing federated algorithms are improved by integrating a customized Adam optimizer. Through experiments, this paper first conducts experimental analyses on hyperparameters, data heterogeneity, and validity on synthetic datasets, which proves that the proposed federated learning algorithm has significant advantages in classification performance and convergence stability when dealing with heterogeneous data. Then, the algorithm is applied to Chinese medical text datasets to verify its effectiveness on real datasets. The comparative analysis of algorithm performance and communication efficiency shows that the algorithm exhibits strong generalization ability on deep learning models for Chinese medical texts. As for the synthetic dataset, upon comparing with comparison algorithms FedAvg, FedProx, FedAtt, and their improved versions, the experimental results show that for data with general heterogeneity, both FedPA and FedPAP show significantly more accurate and stable convergence behavior. On the real Chinese medical dataset of doctor-patient conversations, IMCS-V2, with logistic regression and long short-term memory network as training models, the experiment results show that in comparison to the above three comparison algorithms and their improved versions, FedPA and FedPAP both possess the best accuracy performance and display significantly more stable and accurate convergence behavior, proving that the method in this paper has better classification effects for Chinese medical texts.

Keywords: Adam optimizer; Chinese medical text; federated learning; self-attention mechanism; text classification.

Grants and funding

This research received no external funding.