Event-Based Clinical Finding Extraction from Radiology Reports with Pre-trained Language Model

Wilson Lau; Kevin Lybarger; Martin L Gunn; Meliha Yetisgen

doi:10.1007/s10278-022-00717-5

Event-Based Clinical Finding Extraction from Radiology Reports with Pre-trained Language Model

J Digit Imaging. 2023 Feb;36(1):91-104. doi: 10.1007/s10278-022-00717-5. Epub 2022 Oct 17.

Authors

Wilson Lau¹, Kevin Lybarger², Martin L Gunn³, Meliha Yetisgen²

Affiliations

¹ Biomedical & Health Informatics, School of Medicine, University of Washington, Seattle, WA, USA. [email protected].
² Biomedical & Health Informatics, School of Medicine, University of Washington, Seattle, WA, USA.
³ Department of Radiology, School of Medicine, University of Washington, Seattle, WA, USA.

Abstract

Radiology reports contain a diverse and rich set of clinical abnormalities documented by radiologists during their interpretation of the images. Comprehensive semantic representations of radiological findings would enable a wide range of secondary use applications to support diagnosis, triage, outcomes prediction, and clinical research. In this paper, we present a new corpus of radiology reports annotated with clinical findings. Our annotation schema captures detailed representations of pathologic findings that are observable on imaging ("lesions") and other types of clinical problems ("medical problems"). The schema used an event-based representation to capture fine-grained details, including assertion, anatomy, characteristics, size, and count. Our gold standard corpus contained a total of 500 annotated computed tomography (CT) reports. We extracted triggers and argument entities using two state-of-the-art deep learning architectures, including BERT. We then predicted the linkages between trigger and argument entities (referred to as argument roles) using a BERT-based relation extraction model. We achieved the best extraction performance using a BERT model pre-trained on 3 million radiology reports from our institution: 90.9-93.4% F1 for finding triggers and 72.0-85.6% F1 for argument roles. To assess model generalizability, we used an external validation set randomly sampled from the MIMIC Chest X-ray (MIMIC-CXR) database. The extraction performance on this validation set was 95.6% for finding triggers and 79.1-89.7% for argument roles, demonstrating that the model generalized well to the cross-institutional data with a different imaging modality. We extracted the finding events from all the radiology reports in the MIMIC-CXR database and provided the extractions to the research community.

Keywords: Deep learning; Event extraction; Information extraction; Natural language processing.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Humans
Natural Language Processing
Radiology*
Research Report
Semantics
Tomography, X-Ray Computed

Abstract

Publication types

MeSH terms

Grants and funding