Evaluation of ChatGPT 4.0 in Thoracic Imaging and Diagnostics

Golnaz Lotfian; Keyur Parekh; Mohammed Abdul Sami; Pokhraj P Suthar

doi:10.7759/cureus.73741

Evaluation of ChatGPT 4.0 in Thoracic Imaging and Diagnostics

Cureus. 2024 Nov 15;16(11):e73741. doi: 10.7759/cureus.73741. eCollection 2024 Nov.

Authors

Golnaz Lotfian¹, Keyur Parekh¹, Mohammed Abdul Sami¹, Pokhraj P Suthar¹

Affiliation

¹ Department of Diagnostic Radiology and Nuclear Medicine, Rush University Medical Center, Chicago, USA.

Abstract

Recent advancements in natural language processing (NLP) have profoundly transformed the medical industry, enhancing large cohort data analysis, improving diagnostic capabilities, and streamlining clinical workflows. Among the leading tools in this domain is ChatGPT 4.0 (OpenAI, San Francisco, California, US), a commercial NLP model widely used across various applications. This study evaluates the diagnostic performance of ChatGPT 4.0 specifically in thoracic imaging by assessing its ability to answer diagnostic questions related to this field. We utilized the model to respond to multiple-choice questions derived from thoracic imaging scenarios, followed by rigorous statistical analysis to assess its accuracy and variability across different subgroups. Our analysis revealed significant variability across different subgroups. Overall, the model achieved an impressive accuracy of 84.9% in diagnosing thoracic radiology questions. It excelled in terminology and diagnostic signs, achieving perfect scores, and demonstrated strong performance in the intensive care and normal anatomy categories, with accuracies of 90% and 80%, respectively. In pathology subgroups, ChatGPT achieved an average accuracy of 89.1%, particularly excelling in diagnosing infectious pneumonia and atelectasis, though it scored lower in diffuse alveolar disease (66.7%). For disease-related questions, the mean accuracy was 79.1%, with perfect scores in several specific subcategories. However, accuracy was notably lower for vascular disease (50%) and lung cancer (66.7%). In conclusion, while ChatGPT 4.0 shows strong potential in diagnosing thoracic conditions, the variability identified underscores the necessity for ongoing research and refinement of its transformer architecture. This will enhance its reliability and applicability in broader clinical and patient care settings.

Keywords: ai; gpt; machine learning; natural language processing; nlp; thoracic imaging.