Inter-school variations in the standard of examiners' graduation-level OSCE judgements

Peter Yeates; Adriano Maluf; Gareth McCray; Ruth Kinston; Natalie Cope; Kathy Cullen; Vikki O'Neill; Aidan Cole; Ching-Wa Chung; Rhian Goodfellow; Rebecca Vallender; Sue Ensaff; Rikki Goddard-Fuller; Robert McKinley

doi:10.1080/0142159X.2024.2372087

Inter-school variations in the standard of examiners' graduation-level OSCE judgements

Med Teach. 2024 Jul 8:1-9. doi: 10.1080/0142159X.2024.2372087. Online ahead of print.

Authors

Affiliations

¹ School of Medicine, Keele University, Keele, United Kingdom.
² de Montford University, Leicester, United Kingdom.
³ School of Medicine, Dentistry and Biomedical Sciences, Queens University Belfast, Belfast, United Kingdom.
⁴ School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Aberdeen, United Kingdom.
⁵ School of Medicine, Cardiff University, Cardiff, United Kingdom.
⁶ Christie Education, Christie Hospitals NHS Foundation Trust, Manchester, United Kingdom.

PMID: 38976711
DOI: 10.1080/0142159X.2024.2372087

Abstract

Introduction: Ensuring equivalence in high-stakes performance exams is important for patient safety and candidate fairness. We compared inter-school examiner differences within a shared OSCE and resulting impact on students' pass/fail categorisation.

Methods: The same 6 station formative OSCE ran asynchronously in 4 medical schools, with 2 parallel circuits/school. We compared examiners' judgements using Video-based Examiner Score Comparison and Adjustment (VESCA): examiners scored station-specific comparator videos in addition to 'live' student performances, enabling 1/controlled score comparisons by a/examiner-cohorts and b/schools and 2/data linkage to adjust for the influence of examiner-cohorts. We calculated score impact and change in pass/fail categorisation by school.

Results: On controlled video-based comparisons, inter-school variations in examiners' scoring (16.3%) were nearly double within-school variations (8.8%). Students' scores received a median adjustment of 5.26% (IQR 2.87-7.17%). The impact of adjusting for examiner differences on students' pass/fail categorisation varied by school, with adjustment reducing failure rate from 39.13% to 8.70% (school 2) whilst increasing failure from 0.00% to 21.74% (school 4).

Discussion: Whilst the formative context may partly account for differences, these findings query whether variations may exist between medical schools in examiners' judgements. This may benefit from systematic appraisal to safeguard equivalence. VESCA provided a viable method for comparisons.

Keywords: Assessment; OSCE; equivalence; medical education.