How Reliable are Single-Question Workplace-Based Assessments in Surgery?

Rebecca S Gates; Andrew E Krumm; Olle Ten Cate; Xilin Chen; Kayla Marcotte; Angela E Thelen; Shanley B Deal; Adnan Alseidi; David Swanson; Brian C George

doi:10.1016/j.jsurg.2024.03.015

How Reliable are Single-Question Workplace-Based Assessments in Surgery?

J Surg Educ. 2024 Jul;81(7):967-972. doi: 10.1016/j.jsurg.2024.03.015. Epub 2024 May 29.

Authors

Affiliations

¹ Center for Surgical Training and Research, Department of Surgery, University of Michigan 010-A193, North Campus Research Complex 2800, Plymouth Road Ann Arbor, Michigan 48109; Department of Surgery, Carilion Clinic, 1906 Belleview Ave, Roanoke, VA 24014. Electronic address: [email protected].
² Department of Learning Health Sciences, University of Michigan Medical School, 209 Victor Vaughan Building, 2054, 1111 E. Catherine St., Ann Arbor, MI 48109-2054.
³ University Medical Center Utrecht, Heidelberglaan 100, 3584 CX Utrecht, Netherlands.
⁴ Center for Surgical Training and Research, Department of Surgery, University of Michigan 010-A193, North Campus Research Complex 2800, Plymouth Road Ann Arbor, Michigan 48109.
⁵ Department of Surgery, Virginia Mason Franciscan Health, 1100 9th Ave., Seattle, WA 98101.
⁶ Department of Surgery, University of California San Francisco, 400 Parnassus Ave, San Francisco, CA 94143.
⁷ University of Melbourne School of Medicine, Ground Floor, Medical Building, Cnr Grattan Street & Royal Parade, University of Melbourne, VIC 3052, Australia; University of Queensland School of Medicine, Level 4, Building 69, 288 Herston Road, Herston QLD 4006, Australia.

PMID: 38816336
DOI: 10.1016/j.jsurg.2024.03.015

Abstract

Objective: Workplace-based assessments (WBAs) play an important role in the assessment of surgical trainees. Because these assessment tools are utilized by a multitude of faculty, inter-rater reliability is important to consider when interpreting WBA data. Although there is evidence supporting the validity of many of these tools, inter-reliability evidence is lacking. This study aimed to evaluate the inter-rater reliability of multiple operative WBA tools utilized in general surgery residency.

Design: General surgery residents and teaching faculty were recorded during 6 general surgery operations. Nine faculty raters each reviewed 6 videos and rated each resident on performance (using the Society for Improving Medical Professional Learning, or SIMPL, Performance Scale as well as the operative performance rating system (OPRS) Scale), entrustment (using the ten Cate Entrustment-Supervision Scale), and autonomy (using the Zwisch Scale). The ratings were reviewed for inter-rater reliability using percent agreement and intraclass correlations.

Participants: Nine faculty members viewed the videos and assigned ratings for multiple WBAs.

Results: Absolute intraclass correlation coefficients for each scale ranged from 0.33 to 0.47.

Conclusions: All single-item WBA scales had low to moderate inter-rater reliability. While rater training may improve inter-rater reliability for single observations, many observations by many raters are needed to reliably assess trainee performance in the workplace.

Keywords: general surgery; inter-rater reliability; reliability; workplace-based assessment.

MeSH terms

Clinical Competence*
Education, Medical, Graduate / methods
Educational Measurement* / methods
Faculty, Medical
Female
General Surgery* / education
Humans
Internship and Residency*
Male
Reproducibility of Results
Video Recording
Workplace*