Introduction: No standardized evaluation tool for fellowship applicant assessment exists. Assessment tools are subject to biases and scoring tendencies which can skew scores and impact rankings. We aimed to develop and evaluate an objective assessment tool for fellowship applicants.
Methods: We detected rater effects in our numerically scaled assessment tool (NST), which consisted of 10 domains rated from 0 to 9. We evaluated each domain, consolidated redundant categories, and removed subjective categories. For 7 remaining domains, we described each quality and developed a question with a behaviorally-anchored rating scale (BARS). Applicants were rated by 6 attendings. Ratings from the NST in 2018 were compared with the BARS from 2020 for distribution of data, skewness, and inter-rater reliability.
Results: Thirty-four applicants were evaluated with the NST and 38 with the BARS. Demographics were similar between groups. The median score on the NST was 8 out of 9; scores <5 were used in less than 1% of all evaluations. Distribution of data was improved in the BARS tool. In the NST, scores from 6 of 10 domains demonstrated moderate skewness and 3 high skewness. Three of the 7 domains in the BARS showed moderate skewness and none had high skewness. Two of 10 domains in the NST vs 5 of 7 domains in the BARS achieved good inter-rater reliability.
Conclusion: Replacing a standard numeric scale with a BARS normalized the distribution of data, reduced skewness, and enhanced inter-rater reliability in our evaluation tool. This provides some validity evidence for improved applicant assessment and ranking.
Keywords: applicants; assessment; raters.
Copyright © 2021 Academic Pediatric Association. Published by Elsevier Inc. All rights reserved.