A Large Language Model to Detect Negated Expressions in Radiology Reports

Yvonne Su; Yonatan B Babore; Charles E Kahn Jr

doi:10.1007/s10278-024-01274-9

A Large Language Model to Detect Negated Expressions in Radiology Reports

J Imaging Inform Med. 2024 Sep 25. doi: 10.1007/s10278-024-01274-9. Online ahead of print.

Authors

Yvonne Su^#¹, Yonatan B Babore^#¹, Charles E Kahn Jr^{2

3}

Affiliations

¹ Department of Radiology, Perelman School of Medicine, University of Pennsylvania, 3400 Spruce Street, Philadelphia, 19104, PA, USA.
² Department of Radiology, Perelman School of Medicine, University of Pennsylvania, 3400 Spruce Street, Philadelphia, 19104, PA, USA. [email protected].
³ Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA. [email protected].

^# Contributed equally.

PMID: 39322813
DOI: 10.1007/s10278-024-01274-9

Abstract

Natural language processing (NLP) is crucial to extract information accurately from unstructured text to provide insights for clinical decision-making, quality improvement, and medical research. This study compared the performance of a rule-based NLP system and a medical-domain transformer-based model to detect negated concepts in radiology reports. Using a corpus of 984 de-identified radiology reports from a large U.S.-based academic health system (1000 consecutive reports, excluding 16 duplicates), the investigators compared the rule-based medspaCy system and the Clinical Assertion and Negation Classification Bidirectional Encoder Representations from Transformers (CAN-BERT) system to detect negated expressions of terms from RadLex, the Unified Medical Language System Metathesaurus, and the Radiology Gamuts Ontology. Power analysis determined a sample size of 382 terms to achieve α = 0.05 and β = 0.8 for McNemar's test; based on an estimate of 15% negated terms, 2800 randomly selected terms were annotated manually as negated or not negated. Precision, recall, and F1 of the two models were compared using McNemar's test. Of the 2800 terms, 387 (13.8%) were negated. For negation detection, medspaCy attained a recall of 0.795, precision of 0.356, and F1 of 0.492. CAN-BERT achieved a recall of 0.785, precision of 0.768, and F1 of 0.777. Although recall was not significantly different, CAN-BERT had significantly better precision (χ2 = 304.64; p < 0.001). The transformer-based CAN-BERT model detected negated terms in radiology reports with high precision and recall; its precision significantly exceeded that of the rule-based medspaCy system. Use of this system will improve data extraction from textual reports to support information retrieval, AI model training, and discovery of causal relationships.

Keywords: Large language models; Named entity recognition; Natural language processing; Negated expression (negex) detection; Radiology reports.