The Interrater Reliability of a Coding System for Measuring Mental Health Professionals' Decisions and Actions

Kimberly D Becker; Eleanor G Wu; Jonathan G Westman; Meredith R Boyd; Karen Guan; Davielle Lakind; Wendy Chu; Kendra S Knudsen; W Joshua Bradley; Alayna L Park; Tara Kenworthy LaMarca; Emily Lang; Bruce F Chorpita

doi:10.1080/15374416.2024.2384027

The Interrater Reliability of a Coding System for Measuring Mental Health Professionals' Decisions and Actions

J Clin Child Adolesc Psychol. 2024 Aug 13:1-17. doi: 10.1080/15374416.2024.2384027. Online ahead of print.

Affiliations

¹ Department of Psychology, University of South Carolina.
² Department of Psychology, University of California.

PMID: 39137271
DOI: 10.1080/15374416.2024.2384027

Abstract

Objective: The clinical decisions and actions of evidence-based practice in psychology (EBPP) are largely underspecified and poorly understood, in part due to the lack of measurement methods. We tested the reliability of a behavioral coding system that characterizes a flow of interrelated activities that includes problem detection and prioritization, intervention selection and implementation, and review of intervention integrity and impact.

Method: The context included two publicly funded youth mental health service organizations located in geographically distinct and underresourced communities in the U.S. where service inequities are common. We sampled 84 digitally recorded and transcribed supervision events that included a sample of professionals who were mostly women (93.02%) and BIPOC (86.04%) whose self-reported race/ethnicity matched the youth populations they served. We coded these events for activities (e.g., considering) and their predicate content (i.e., problems or practices) and examined reliability of these codes applied to excerpts (i.e., small contiguous units of dialogue) as well as to complete events.

Results: Interrater reliability estimates showed that, overall, coders reliably rated the occurrence and extensiveness of activities and content. Excerpt coding was generally more reliable than event coding. However, mathematical aggregation of excerpt coding offered a superior method for estimating event codes reliably, reducing individual subjectivity while providing event level synthesis of activities that are grounded in excerpt level details.

Conclusions: The assessment of clinical decisions and actions has the potential to unpack the black box of EBPP, with different methods best suited to different research questions and resource considerations.