Performance of ChatGPT-4o on the Japanese Medical Licensing Examination: Evalution of Accuracy in Text-Only and Image-Based Questions

JMIR Med Educ. 2024 Dec 24:10:e63129. doi: 10.2196/63129.

Abstract

This study evaluated the performance of ChatGPT with GPT-4 Omni (GPT-4o) on the 118th Japanese Medical Licensing Examination. The study focused on both text-only and image-based questions. The model demonstrated a high level of accuracy overall, with no significant difference in performance between text-only and image-based questions. Common errors included clinical judgment mistakes and prioritization issues, underscoring the need for further improvement in the integration of artificial intelligence into medical education and practice.

Keywords: AI technology; ChatGPT; GPT-4o; Japan; accuracy; application; artificial intelligence; clinical decision-making; decision-making; image-based; images; medical education; medical licensing examination; reliability.

MeSH terms

  • Artificial Intelligence
  • East Asian People
  • Educational Measurement* / methods
  • Humans
  • Japan
  • Licensure, Medical*