Development and Evaluation of an Automated Protocol Recommendation System for Chest CT Using Natural Language Processing With CLEVER Terminology Word Replacement

Patrik Rogalla; Jennifer Fratesi; Sonja Kandel; Demetris Patsios; Farzad Khalvati; Sean Carey

doi:10.1177/08465371241280219

Development and Evaluation of an Automated Protocol Recommendation System for Chest CT Using Natural Language Processing With CLEVER Terminology Word Replacement

Can Assoc Radiol J. 2024 Sep 24:8465371241280219. doi: 10.1177/08465371241280219. Online ahead of print.

Authors

Patrik Rogalla¹, Jennifer Fratesi¹, Sonja Kandel¹, Demetris Patsios¹, Farzad Khalvati², Sean Carey¹

Affiliations

¹ Joint Department of Medical Imaging, University of Toronto, Toronto, ON, Canada.
² Departments of Medical Imaging and Computer Science, University of Toronto, Toronto, ON, Canada.

PMID: 39315514
DOI: 10.1177/08465371241280219

Abstract

Purpose: To evaluate the clinical performance of a Protocol Recommendation System (PRS) automatic protocolling of chest CT imaging requests. Materials and Methods: 322 387 consecutive historical imaging requests for chest CT between 2017 and 2022 were extracted from a radiology information system (RIS) database containing 16 associated patient information values. Records with missing fields and protocols with <100 occurrences were removed, leaving 18 protocols for training. After freetext pre-processing and applying CLEVER terminology word replacements, the features of a bag-of-words model were used to train a multinomial logistic regression classifier. Four readers protocolled 300 clinically executed protocols (CEP) based on all clinically available information. After their selection was made, the PRS and CEP were unblinded, and the readers were asked to score their agreement (1 = severe error, 2 = moderate error, 3 = disagreement but acceptable, 4 = agreement). The ground truth was established by the readers' majority selection, a judge helped break ties. For the PRS and CEP, the accuracy and clinical acceptability (scores 3 and 4) were calculated. The readers' protocolling reliability was measured using Fleiss' Kappa. Results: Four readers agreed on 203/300 protocols, 3 on 82/300 cases, and in 15 cases, a judge was needed. PRS errors were found by the 4 readers in 1%, 2.7%, 1%, and 0.7% of the cases, respectively. The accuracy/clinical acceptability of the PRS and CEP were 84.3%/98.6% and 83.0%/99.3%, respectively. The Fleiss' Kappa for all readers and all protocols was 0.805. Conclusion: The PRS achieved similar accuracy to human performance and may help radiologists master the ever-increasing workload.

Keywords: chest; computed tomography; natural language processing; protocols.