Background: In medicine, standard setting methodologies have been developed for both selected-response and performance-based assessments. For simulation-based tasks, research efforts have been directed primarily at assessments that incorporate standardized patients. Mannequin-based evaluations often demand complex, time-sensitive, hierarchically ordered, sequential actions that are difficult to evaluate and score. Moreover, collecting reliable proficiency judgments, necessary to estimate meaningful cut points, can be challenging. The purpose of this investigation was to explore whether expert judgments obtained using an examinee-centered standard setting method that was previously validated for standardized patient-based assessments could be used to set defensible standards for acute-care, mannequin-based scenarios.
Methods: Nineteen physicians were recruited to serve as panelists. For each of 12 simulation scenarios, between 8 and 10 performance samples (audio-video recordings), covering the expected ability continuum, were chosen for review. The performance samples were selected from a previously administered evaluation of postgraduate trainees. Based on a consensus definition of readiness to enter unsupervised practice, the panelists made independent judgments of each performance. For each scenario, the association between the panelists' judgments and the assessment scores was summarized and used to estimate a scenario-specific cut score.
Results: For 9 of the scenarios, there was at least a moderately strong relationship between the aggregate panelists' rating and the performance scores, thus allowing for estimation of meaningful numeric standards. For the other 3 scenarios, the aggregate decision rules used by the panelists did not correspond with the achievement measures. For scenarios independently rated by split panels, the estimated cut scores were similar.
Conclusions: An examinee-centered approach, using aggregate expert judgments of audio-video performances, was suitable for setting standards on most acute-care, mannequin-based scenarios. It is necessary, however, to have valid scores for the chosen scenarios and to sample performances across the ability spectrum.