Evaluating ChatGPT-4V in chest CT diagnostics: a critical image interpretation assessment

Jpn J Radiol. 2024 Oct;42(10):1168-1177. doi: 10.1007/s11604-024-01606-3. Epub 2024 Jun 13.

Abstract

Purpose: To assess the diagnostic accuracy of ChatGPT-4V in interpreting a set of four chest CT slices for each case of COVID-19, non-small cell lung cancer (NSCLC), and control cases, thereby evaluating its potential as an AI tool in radiological diagnostics.

Materials and methods: In this retrospective study, 60 CT scans from The Cancer Imaging Archive, covering COVID-19, NSCLC, and control cases were analyzed using ChatGPT-4V. A radiologist selected four CT slices from each scan for evaluation. ChatGPT-4V's interpretations were compared against the gold standard diagnoses and assessed by two radiologists. Statistical analyses focused on accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), along with an examination of the impact of pathology location and lobe involvement.

Results: ChatGPT-4V showed an overall diagnostic accuracy of 56.76%. For NSCLC, sensitivity was 27.27% and specificity was 60.47%. In COVID-19 detection, sensitivity was 13.64% and specificity of 64.29%. For control cases, the sensitivity was 31.82%, with a specificity of 95.24%. The highest sensitivity (83.33%) was observed in cases involving all lung lobes. The chi-squared statistical analysis indicated significant differences in Sensitivity across categories and in relation to the location and lobar involvement of pathologies.

Conclusion: ChatGPT-4V demonstrated variable diagnostic performance in chest CT interpretation, with notable proficiency in specific scenarios. This underscores the challenges of cross-modal AI models like ChatGPT-4V in radiology, pointing toward significant areas for improvement to ensure dependability. The study emphasizes the importance of enhancing these models for broader, more reliable medical use.

Keywords: AI (artificial intelligence); ChatGPT-4V; Computed tomography; Computer-aided diagnosis (CAD).

MeSH terms

  • Aged
  • COVID-19* / diagnostic imaging
  • Carcinoma, Non-Small-Cell Lung* / diagnostic imaging
  • Coronavirus Infections / diagnostic imaging
  • Female
  • Humans
  • Lung / diagnostic imaging
  • Lung Neoplasms* / diagnostic imaging
  • Male
  • Middle Aged
  • Pandemics
  • Pneumonia, Viral / diagnostic imaging
  • Radiographic Image Interpretation, Computer-Assisted / methods
  • Radiography, Thoracic / methods
  • Reproducibility of Results
  • Retrospective Studies
  • SARS-CoV-2
  • Sensitivity and Specificity*
  • Tomography, X-Ray Computed* / methods