JTIS: enhancing biomedical document-level relation extraction through joint training with intermediate steps

Database (Oxford). 2024 Dec 19:2024:baae125. doi: 10.1093/database/baae125.

Abstract

Biomedical Relation Extraction (RE) is central to Biomedical Natural Language Processing and is crucial for various downstream applications. Existing RE challenges in the field of biology have primarily focused on intra-sentential analysis. However, with the rapid increase in the volume of literature and the complexity of relationships between biomedical entities, it often becomes necessary to consider multiple sentences to fully extract the relationship between a pair of entities. Current methods often fail to fully capture the complex semantic structures of information in documents, thereby affecting extraction accuracy. Therefore, unlike traditional RE methods that rely on sentence-level analysis and heuristic rules, our method focuses on extracting entity relationships from biomedical literature titles and abstracts and classifying relations that are novel findings. In our method, a multitask training approach is employed for fine-tuning a Pre-trained Language Model in the field of biology. Based on a broad spectrum of carefully designed tasks, our multitask method not only extracts relations of better quality due to more effective supervision but also achieves a more accurate classification of whether the entity pairs are novel findings. Moreover, by applying a model ensemble method, we further enhance our model's performance. The extensive experiments demonstrate that our method achieves significant performance improvements, i.e. surpassing the existing baseline by 3.94% in RE and 3.27% in Triplet Novel Typing in F1 score on BioRED, confirming its effectiveness in handling complex biomedical literature RE tasks. Database URL: https://codalab.lisn.upsaclay.fr/competitions/13377#learn_the_details-dataset.

MeSH terms

  • Data Mining* / methods
  • Humans
  • Natural Language Processing*
  • Semantics