Background and objective: Automatic clinical coding is a crucial task in the process of extracting relevant information from unstructured medical documents contained in Electronic Health Records (EHR). However, most of the existing computer-based methods for clinical coding act as "black boxes", without giving a detailed description of the reasons for the clinical-coding assignments, which greatly limits their applicability to real-world medical scenarios. The objective of this study is to use transformer-based models to effectively tackle explainable clinical-coding. In this way, we require the models to perform the assignments of clinical codes to medical cases, but also to provide the reference in the text that justifies each coding assignment.
Methods: We examine the performance of 3 transformer-based architectures on 3 different explainable clinical-coding tasks. For each transformer, we compare the performance of the original general-domain version with an in-domain version of the model adapted to the specificities of the medical domain. We address the explainable clinical-coding problem as a dual medical named entity recognition (MER) and medical named entity normalization (MEN) task. For this purpose, we have developed two different approaches, namely a multi-task and a hierarchical-task strategy.
Results: For each analyzed transformer, the clinical-domain version significantly outperforms the corresponding general domain model across the 3 explainable clinical-coding tasks analyzed in this study. Furthermore, the hierarchical-task approach yields a significantly superior performance than the multi-task strategy. Specifically, the combination of the hierarchical-task strategy with an ensemble approach leveraging the predictive capabilities of the 3 distinct clinical-domain transformers, yields the best obtained results, with f1-score, precision and recall of 0.852, 0.847 and 0.849 on the Cantemist-Norm task and 0.718, 0.566 and 0.633 on the CodiEsp-X task, respectively.
Conclusions: By separately addressing the MER and MEN tasks, as well as by following a context-aware text-classification approach to tackle the MEN task, the hierarchical-task approach effectively reduces the intrinsic complexity of explainable clinical-coding, leading the transformers to establish new SOTA performances for the predictive tasks considered in this study. In addition, the proposed methodology has the potential to be applied to other clinical tasks that require both the recognition and normalization of medical entities.
Keywords: Clinical Coding; Deep Learning; Explainable Artificial Intelligence; Medical Entity Normalization; Natural Language Processing; Transformers.
Copyright © 2023 The Author(s). Published by Elsevier Inc. All rights reserved.