A Comparison of Psychometric Properties of the American Board of Anesthesiology's In-Person and Virtual Standardized Oral Examinations

Acad Med. 2024 Jun 7. doi: 10.1097/ACM.0000000000005782. Online ahead of print.

Abstract

Purpose: The COVID-19 pandemic prompted training institutions and national credentialing organizations to administer examinations virtually. This study compared task difficulty, examiner grading, candidate performance, and other psychometric properties between in-person and virtual standardized oral examinations (SOEs) administered by the American Board of Anesthesiology.

Method: This retrospective study included SOEs administered in person from March 2018 through March 2020 and virtually from December 2020 through November 2021. The in-person and virtual SOEs share the same structure, including 4 tasks of preoperative evaluation, intraoperative management, postoperative care, and additional topics. The Many-Facet Rasch Model was used to estimate candidate performance, examiner grading severity, and task difficulty for the in-person and virtual SOEs separately; the virtual SOE was equated to the in-person SOE by common examiners and all tasks. The independent-samples and partially overlapping-samples t tests were used to compare candidate performance and examiner grading severity between these 2 formats, respectively.

Results: In-person (n = 3,462) and virtual (n = 2,959) first-time candidates were comparable in age, sex, race and ethnicity, and whether they were U.S. medical school graduates. The mean (standard deviation [SD]) candidate performance was 2.96 (1.76) logits for the virtual SOE, which was statistically significantly better than that for the in-person SOE (mean [SD], 2.86 [1.75]; Welch independent-samples t test, P = .02); however, the effect size was negligible (Cohen d = 0.06). The difference in the grading severity of examiners who rated the in-person (n = 398; mean [SD], 0.00 [0.73]) vs virtual (n = 341; mean [SD], 0.07 [0.77]) SOE was not statistically significant (Welch partially overlapping-samples t test, P = .07).

Conclusions: Candidate performance and examiner grading severity were comparable between the in-person and virtual SOEs, supporting the reliability and validity of the virtual oral exam in this large-volume, high-stakes setting.