Assessment of Artificial Intelligence Performance on the Otolaryngology Residency In-Service Exam

OTO Open. 2023 Nov 29;7(4):e98. doi: 10.1002/oto2.98. eCollection 2023 Oct-Dec.

Abstract

Objectives: This study seeks to determine the potential use and reliability of a large language learning model for answering questions in a sub-specialized area of medicine, specifically practice exam questions in otolaryngology-head and neck surgery and assess its current efficacy for surgical trainees and learners.

Study design and setting: All available questions from a public, paid-access question bank were manually input through ChatGPT.

Methods: Outputs from ChatGPT were compared against the benchmark of the answers and explanations from the question bank. Questions were assessed in 2 domains: accuracy and comprehensiveness of explanations.

Results: Overall, our study demonstrates a ChatGPT correct answer rate of 53% and a correct explanation rate of 54%. We find that with increasing difficulty of questions there is a decreasing rate of answer and explanation accuracy.

Conclusion: Currently, artificial intelligence-driven learning platforms are not robust enough to be reliable medical education resources to assist learners in sub-specialty specific patient decision making scenarios.

Keywords: BoardVitals; ChatGPT; artificial intelligence; in‐service exams; large language models; otolaryngology residency training.