QUEST-AI: A System for Question Generation, Verification, and Refinement using AI for USMLE-Style Exams

Suhana Bedi; Scott L Fleming; Chia-Chun Chiang; Keith Morse; Aswathi Kumar; Birju Patel; Jenelle A Jindal; Conor Davenport; Craig Yamaguchi; Nigam H Shah

QUEST-AI: A System for Question Generation, Verification, and Refinement using AI for USMLE-Style Exams

Pac Symp Biocomput. 2025:30:54-69.

Authors

Suhana Bedi¹, Scott L Fleming, Chia-Chun Chiang, Keith Morse, Aswathi Kumar, Birju Patel, Jenelle A Jindal, Conor Davenport, Craig Yamaguchi, Nigam H Shah

Affiliation

¹ Department of Biomedical Data Science, Stanford University, Stanford, CA, USA. [email protected].

PMID: 39670361

Abstract

The United States Medical Licensing Examination (USMLE) is a critical step in assessing the competence of future physicians, yet the process of creating exam questions and study materials is both time-consuming and costly. While Large Language Models (LLMs), such as OpenAI's GPT-4, have demonstrated proficiency in answering medical exam questions, their potential in generating such questions remains underexplored. This study presents QUEST-AI, a novel system that utilizes LLMs to (1) generate USMLE-style questions, (2) identify and flag incorrect questions, and (3) correct errors in the flagged questions. We evaluated this system's output by constructing a test set of 50 LLM-generated questions mixed with 50 human-generated questions and conducting a two-part assessment with three physicians and two medical students. The assessors attempted to distinguish between LLM and human-generated questions and evaluated the validity of the LLM-generated content. A majority of exam questions generated by QUEST-AI were deemed valid by a panel of three clinicians, with strong correlations between performance on LLM-generated and human-generated questions. This pioneering application of LLMs in medical education could significantly increase the ease and efficiency of developing USMLE-style medical exam content, offering a cost-effective and accessible alternative for exam preparation.

MeSH terms

Artificial Intelligence
Clinical Competence / statistics & numerical data
Computational Biology*
Educational Measurement* / methods
Educational Measurement* / standards
Educational Measurement* / statistics & numerical data
Humans
Licensure, Medical* / standards
Students, Medical / statistics & numerical data
United States