Inter-rater reliability and content validity of the measurement tool for portfolio assessments used in the Introduction to Clinical Medicine course at Ewha Womans University College of Medicine: a methodological study

Dong-Mi Yoo; Jae Jin Han

doi:10.3352/jeehp.2024.21.39

Inter-rater reliability and content validity of the measurement tool for portfolio assessments used in the Introduction to Clinical Medicine course at Ewha Womans University College of Medicine: a methodological study

J Educ Eval Health Prof. 2024:21:39. doi: 10.3352/jeehp.2024.21.39. Epub 2024 Dec 10.

Authors

Dong-Mi Yoo¹, Jae Jin Han²

Affiliations

¹ Department of Medical Education, College of Medicine, The Catholic University of Korea, Seoul, Korea.
² Department of Medical Education & Thoracic Surgery, Ewha Womans University College of Medicine, Seoul, Korea.

Abstract

Purpose: This study aimed to examine the reliability and validity of a measurement tool for portfolio assessments in medical education. Specifically, it investigated scoring consistency among raters and assessment criteria appropriateness according to an expert panel.

Methods: A cross-sectional observational study was conducted from September to December 2018 for the Introduction to Clinical Medicine course at the Ewha Womans University College of Medicine. Data were collected for 5 randomly selected portfolios scored by a gold-standard rater and 6 trained raters. An expert panel assessed the validity of 12 assessment items using the content validity index (CVI). Statistical analysis included Pearson correlation coefficients for rater alignment, the intraclass correlation coefficient (ICC) for inter-rater reliability, and the CVI for item-level validity.

Results: Rater 1 had the highest Pearson correlation (0.8916) with the gold-standard rater, while Rater 5 had the lowest (0.4203). The ICC for all raters was 0.3821, improving to 0.4415 after excluding Raters 1 and 5, indicating a 15.6% reliability increase. All assessment items met the CVI threshold of ≥0.75, with some achieving a perfect score (CVI=1.0). However, items like "sources" and "level and degree of performance" showed lower validity (CVI=0.72).

Conclusion: The present measurement tool for portfolio assessments demonstrated moderate reliability and strong validity, supporting its use as a credible tool. For a more reliable portfolio assessment, more faculty training is needed.

Keywords: Clinical medicine; Educational status; Medical education; Reproducibility of results; Republic of Korea.

Publication types

Observational Study

MeSH terms

Clinical Competence / standards
Clinical Medicine / education
Cross-Sectional Studies
Education, Medical / methods
Education, Medical, Undergraduate / methods
Educational Measurement* / methods
Educational Measurement* / standards
Humans
Observer Variation
Reproducibility of Results
Schools, Medical / standards
Universities