Comparative effectiveness of standard vs. AI-assisted PET/CT reading workflow for pre-treatment lymphoma staging: a multi-institutional reader study evaluation

Front Nucl Med. 2024 Jan 11:3:1327186. doi: 10.3389/fnume.2023.1327186. eCollection 2023.

Abstract

Background: Fluorine-18 fluorodeoxyglucose (FDG)-positron emission tomography/computed tomography (PET/CT) is widely used for staging high-grade lymphoma, with the time to evaluate such studies varying depending on the complexity of the case. Integrating artificial intelligence (AI) within the reporting workflow has the potential to improve quality and efficiency. The aims of the present study were to evaluate the influence of an integrated research prototype segmentation tool implemented within diagnostic PET/CT reading software on the speed and quality of reporting with variable levels of experience, and to assess the effect of the AI-assisted workflow on reader confidence and whether this tool influenced reporting behaviour.

Methods: Nine blinded reporters (three trainees, three junior consultants and three senior consultants) from three UK centres participated in a two-part reader study. A total of 15 lymphoma staging PET/CT scans were evaluated twice: first, using a standard PET/CT reporting workflow; then, after a 6-week gap, with AI assistance incorporating pre-segmentation of disease sites within the reading software. An even split of PET/CT segmentations with gold standard (GS), false-positive (FP) over-contour or false-negative (FN) under-contour were provided. The read duration was calculated using file logs, while the report quality was independently assessed by two radiologists with >15 years of experience. Confidence in AI assistance and identification of disease was assessed via online questionnaires for each case.

Results: There was a significant decrease in time between non-AI and AI-assisted reads (median 15.0 vs. 13.3 min, p < 0.001). Sub-analysis confirmed this was true for both junior (14.5 vs. 12.7 min, p = 0.03) and senior consultants (15.1 vs. 12.2 min, p = 0.03) but not for trainees (18.1 vs. 18.0 min, p = 0.2). There was no significant difference between report quality between reads. AI assistance provided a significant increase in confidence of disease identification (p < 0.001). This held true when splitting the data into FN, GS and FP. In 19/88 cases, participants did not identify either FP (31.8%) or FN (11.4%) segmentations. This was significantly greater for trainees (13/30, 43.3%) than for junior (3/28, 10.7%, p = 0.05) and senior consultants (3/30, 10.0%, p = 0.05).

Conclusions: The study findings indicate that an AI-assisted workflow achieves comparable performance to humans, demonstrating a marginal enhancement in reporting speed. Less experienced readers were more influenced by segmentation errors. An AI-assisted PET/CT reading workflow has the potential to increase reporting efficiency without adversely affecting quality, which could reduce costs and report turnaround times. These preliminary findings need to be confirmed in larger studies.

Keywords: PET/CT; artificial intelligence; efficiency; lymphoma; multi-reader study.

Grants and funding

The authors declare financial support was received for the research, authorship, and/or publication of this article.