Research of multi-label text classification based on label attention and correlation networks

PLoS One. 2024 Sep 30;19(9):e0311305. doi: 10.1371/journal.pone.0311305. eCollection 2024.

Abstract

Multi-Label Text Classification (MLTC) is a crucial task in natural language processing. Compared to single-label text classification, MLTC is more challenging due to its vast collection of labels which include extracting local semantic information, learning label correlations, and solving label data imbalance problems. This paper proposes a model of Label Attention and Correlation Networks (LACN) to address the challenges of classifying multi-label text and enhance classification performance. The proposed model employs the label attention mechanism for a more discriminative text representation and uses the correlation network based on label distribution to enhance the classification results. Also, a weight factor based on the number of samples and a modulation function based on prediction probability are combined to alleviate the label data imbalance effectively. Extensive experiments are conducted on the widely-used conventional datasets AAPD and RCV1-v2, and extreme datasets EUR-LEX and AmazonCat-13K. The results indicate that the proposed model can be used to deal with extreme multi-label data and achieve optimal or suboptimal results versus state-of-the-art methods. For the AAPD dataset, compared with the suboptimal method, it outperforms the second-best method by 2.05% ∼ 5.07% in precision@k and by 2.10% ∼ 3.24% in NDCG@k for k = 1, 3, 5. The superior outcomes demonstrate the effectiveness of LACN and its competitiveness in dealing with MLTC tasks.

MeSH terms

  • Algorithms
  • Data Mining / methods
  • Humans
  • Natural Language Processing*
  • Semantics

Grants and funding

This paper is funded by the National Natural Science Foundation of China under Grant No.62272180, Hubei Provincial Teaching and Research Project for Higher Education Institutions (No.2022570), Wuhan Education Science Planning Project (No.2022C151), Wuhan Vocational College of Software and Engineering Research Startup funding project (Grant No.KYQDJF2023004), Wuhan Vocational College of Software and Engineering 2023 Doctor Team Science and Technology Innovation Platform Project (No.BSPT2023001).