Background: Incidental findings of aortic aneurysms (AAs) often go unreported, and established patients are frequently lost to follow-up. Natural language processing (NLP) offers a promising solution to address these issues. While rule-based NLP methods have shown some success, recent advancements in transformer-based large language models (LLMs) remain underutilized. This study has 3 following aims: (1) to evaluate the effectiveness of our innovative transformer-based NLP pipeline regarding AA detection; (2) to detail the clinical impact by quantifying the number of patients who could benefit from such technology; and (3) to use this information to help coordinate appointments with patients, ensuring proper monitoring and management.
Methods: 3,229 radiology reports were divided into 3 batches with varying class balance. Each entry was processed through our innovative NLP pipeline, where it was fragmented using regular expression functions to isolate relevant textual segments. These segments were subsequently processed through our "question and find" (Q&F) function, powered by Google's bidirectional encoder representations from transformers, a well-established transformer LLM. This Q&F function extracted aortic diameter measurements, flagging measurements that exceeded a predefined threshold. Following detection, we conducted comprehensive chart reviews and contacted primary care providers (PCPs) and patients to categorize aneurysms as "known" or "incidental." We also assessed whether patients with known aneurysms were adhering to regular yearly screenings and coordinated follow-up appointments.
Results: Evaluation of the 3 batches showed high F1 scores: 99.4% (95% CI [98.5-100]), 96.7% (95% CI [95.0-98.2]), and 98.9% (95% CI [98.0-99.6]). Overall measurement accuracy was 98.9% (95% CI [97.6-100]), 99.6% (95% CI [99.3-99.9]), and 98.1% (95% CI [96.8-99.4]). Compared to manual chart reviews, the NLP system demonstrated superior accuracy and fewer errors: 12 vs. 22 (P = 0.084), 47 vs. 98 (P = 0.000021), and 31 vs. 53 (P = 0.015). Of the 412 patients investigated, 58 (14.1%) involved incidental findings, 54 patients (15.3%) were lost to follow-up, 39 patients (55.7%) were successfully contacted, and 37 follow-up appointments (12.1%) were successfully coordinated.
Conclusions: The high-performance metrics from our study demonstrate that transformer-based NLP can enhance AA surveillance. Our subsequent comprehensive patient profiling highlighted the need for such a system as a safety net within the electronic medical record, systematically reviewing radiology reports to detect incidental findings and patients lost to follow-up. This ensures appropriate referrals and monitoring, improving patient outcomes and health-care efficiency through timely clinical interventions.
Copyright © 2024 Elsevier Inc. All rights reserved.