Establishing Inter- and Intrarater Reliability for High-Stakes Testing Using Simulation

Suzan Kardong-Edgren; Marilyn H Oermann; Mary Anne Rizzolo; Tamara Odom-Maryon

doi:10.1097/01.NEP.0000000000000114

Establishing Inter- and Intrarater Reliability for High-Stakes Testing Using Simulation

Nurs Educ Perspect. 2017 Mar/Apr;38(2):63-68. doi: 10.1097/01.NEP.0000000000000114.

Authors

Suzan Kardong-Edgren¹, Marilyn H Oermann, Mary Anne Rizzolo, Tamara Odom-Maryon

Affiliation

¹ About the Authors Suzan Kardong-Edgren, PhD, RN, CHSE, FAAN, ANEF, is a professor and director of the RISE Center, School of Nursing and Health Sciences, Robert Morris University, Moon Township, Pennsylvania. Marilyn H. Oermann, PhD, RN, FAAN, ANEF, is Thelma M. Ingles Professor of Nursing and director of evaluation and educational research, Duke University School of Nursing, Durham, North Carolina. Mary Anne Rizzolo, EdD, RN, FAAN, ANEF, is a consultant for the National League for Nursing. Tamara Odom-Maryon, PhD, is a professor of research, Washington State University College of Nursing, Spokane. For more information, contact Dr. Kardong-Edgren at [email protected].

PMID: 29194298
DOI: 10.1097/01.NEP.0000000000000114

Abstract

Aim: This article reports one method to develop a standardized training method to establish the inter- and intrarater reliability of a group of raters for high-stakes testing.

Background: Simulation is used increasingly for high-stakes testing, but without research into the development of inter- and intrarater reliability for raters.

Method: Eleven raters were trained using a standardized methodology. Raters scored 28 student videos over a six-week period. Raters then rescored all videos over a two-day period to establish both intra- and interrater reliability.

Results: One rater demonstrated poor intrarater reliability; a second rater failed all students. Kappa statistics improved from the moderate to substantial agreement range with the exclusion of the two outlier raters' scores.

Conclusion: There may be faculty who, for different reasons, should not be included in high-stakes testing evaluations. All faculty are content experts, but not all are expert evaluators.

MeSH terms

Education, Nursing / methods*
Educational Measurement*
Faculty, Nursing
Humans
Reproducibility of Results
Simulation Training / methods*
Videotape Recording*