The performance of ChatGPT-4.0o in medical imaging evaluation: a cross-sectional study

J Educ Eval Health Prof. 2024:21:29. doi: 10.3352/jeehp.2024.21.29. Epub 2024 Oct 31.

Abstract

This study investigated the performance of ChatGPT-4.0o in evaluating the quality of positioning in radiographic images. Thirty radiographs depicting a variety of knee, elbow, ankle, hand, pelvis, and shoulder projections were produced using anthropomorphic phantoms and uploaded to ChatGPT-4.0o. The model was prompted to provide a solution to identify any positioning errors with justification and offer improvements. A panel of radiographers assessed the solutions for radiographic quality based on established positioning criteria, with a grading scale of 1–5. In only 20% of projections, ChatGPT-4.0o correctly recognized all errors with justifications and offered correct suggestions for improvement. The most commonly occurring score was 3 (9 cases, 30%), wherein the model recognized at least 1 specific error and provided a correct improvement. The mean score was 2.9. Overall, low accuracy was demonstrated, with most projections receiving only partially correct solutions. The findings reinforce the importance of robust radiography education and clinical experience.

Keywords: Artificial intelligence; Diagnostic imaging; Radiography; Radiology.

MeSH terms

  • Humans
  • Patient Positioning
  • Phantoms, Imaging*
  • Radiography / methods
  • Radiology / education