Establishing Inter- and Intrarater Reliability for High-Stakes Testing Using Simulation

Nurs Educ Perspect. 2017 Mar/Apr;38(2):63-68. doi: 10.1097/01.NEP.0000000000000114.

Abstract

Aim: This article reports one method to develop a standardized training method to establish the inter- and intrarater reliability of a group of raters for high-stakes testing.

Background: Simulation is used increasingly for high-stakes testing, but without research into the development of inter- and intrarater reliability for raters.

Method: Eleven raters were trained using a standardized methodology. Raters scored 28 student videos over a six-week period. Raters then rescored all videos over a two-day period to establish both intra- and interrater reliability.

Results: One rater demonstrated poor intrarater reliability; a second rater failed all students. Kappa statistics improved from the moderate to substantial agreement range with the exclusion of the two outlier raters' scores.

Conclusion: There may be faculty who, for different reasons, should not be included in high-stakes testing evaluations. All faculty are content experts, but not all are expert evaluators.

MeSH terms

  • Education, Nursing / methods*
  • Educational Measurement*
  • Faculty, Nursing
  • Humans
  • Reproducibility of Results
  • Simulation Training / methods*
  • Videotape Recording*