Reliability and validity of checklists and global ratings by standardized students, trained raters, and faculty raters in an objective structured teaching environment

Mark Quirk; Kathleen Mazor; Heather-Lyn Haley; Scott Wellman; David Keller; David Hatem; Lisa A Keller

doi:10.1207/s15328015tlm1703_2

Reliability and validity of checklists and global ratings by standardized students, trained raters, and faculty raters in an objective structured teaching environment

Teach Learn Med. 2005 Summer;17(3):202-9. doi: 10.1207/s15328015tlm1703_2.

Authors

Mark Quirk¹, Kathleen Mazor, Heather-Lyn Haley, Scott Wellman, David Keller, David Hatem, Lisa A Keller

Affiliation

¹ University of Massachusetts Medical School, Community Faculty Development Center, Worcester, Massachusetts 01655, USA. [email protected]

PMID: 16042515
DOI: 10.1207/s15328015tlm1703_2

Abstract

Background: Objective structured teaching exercises (OSTEs) are relatively new in medical education, with few studies that have reported reliability and validity.

Purpose: To systematically examine the impact of OSTE design decisions, including number of cases, choice of raters, and type of scoring systems used.

Methods: We examined the impact of number of cases and raters using generalizability theory. We also compared scores from standardized students (SS), faculty raters (FR) and trained graduate student raters (TR), and examined the relation between behavior checklist ratings and global perception scores.

Results: Generalizability (g) coefficients for checklist scores were higher for SSs than TRs. The g estimates based on SSs' global scores were higher than g estimates for FRs. SSs' checklist scores were higher than TRs' checklist scores, and SSs' global evaluations were higher than FRs' and TRs' global scores. TRs' relative to SSs' global perceptions correlated more highly with checklist scores.

Conclusions: SSs provide more generalizable checklist scores than TRs. Generalizability estimates for global scores from SSs and FRs were comparable. SSs are lenient raters compared to TRs and FRs.

MeSH terms

Education, Medical / standards*
Educational Measurement / standards*
Faculty*
Humans
Reproducibility of Results
Research Design
Students, Medical*
Teaching / methods*
Teaching / standards