Gathering Validity Evidence for the Bushmaster Assessment Tool

Mil Med. 2024 Dec 18:usae549. doi: 10.1093/milmed/usae549. Online ahead of print.

Abstract

Introduction: The education of military medical providers typically relies on assessments with established passing parameters to help ensure individuals are equipped to care for those in harm's way. Evaluations of medical knowledge are often provided by governing bodies and are supported by strong validity evidence. In contrast, assessing an individual's leadership skills presents a challenge, as tools with robust validity evidence for leadership evaluation are not yet as widely available as clinical assessment tools. This challenge becomes even more complex in simulated environments designed to mimic intense operational conditions.

Materials and methods: An instrument has been implemented to assess students explicitly in a variety of graded roles with varying responsibilities. Faculty rate each student on their character, context, leadership-transcendent skills, communication, and competence using a 4-point Likert scale. This project used confirmatory factor analyses to assess the validity evidence of the instrument used during Bushmaster with data gathered from 645 School of Medicine students and 170 faculty evaluators from 2021 to 2023 resulting in 2863 evaluations.

Results: Overall, the one-factor structure was confirmed with Tucker Lewis Index >0.95, Root Mean Square Error of Approximation <0.03, and Standardized Root Mean Square Residual <0.03; student or faculty assessment had a small effect on item scores (interclass correlation <0.19) while the assigned position significantly affected the item score. The evaluation score of behavioral health officer and platoon leader was higher than the evaluation score of surgeon.

Conclusions: This study provides validity evidence for the Bushmaster leader assessment tool, confirming its ability to measure leader performance in military medical education. The findings highlight the importance of standardized faculty training in ensuring consistent evaluations, as variations in scores were influenced more by evaluation conditions than by differences among students or evaluators.