This study evaluated the performance of ChatGPT with GPT-4 Omni (GPT-4o) on the 118th Japanese Medical Licensing Examination. The study focused on both text-only and image-based questions. The model demonstrated a high level of accuracy overall, with no significant difference in performance between text-only and image-based questions. Common errors included clinical judgment mistakes and prioritization issues, underscoring the need for further improvement in the integration of artificial intelligence into medical education and practice.
Keywords: AI technology; ChatGPT; GPT-4o; Japan; accuracy; application; artificial intelligence; clinical decision-making; decision-making; image-based; images; medical education; medical licensing examination; reliability.
© Yuki Miyazaki, Masahiro Hata, Hisaki Omori, Atsuya Hirashima, Yuta Nakagawa, Mitsuhiro Eto, Shun Takahashi, Manabu Ikeda. Originally published in JMIR Medical Education (https://mededu.jmir.org).