The purpose of this study was to examine the reliability of written expression curriculum-based measurement (WE-CBM) in the context of universal screening from a generalizability theory framework. Students in second through fifth grade (n = 145) participated in the study. The sample included 54% female students, 49% White students, 23% African American students, 17% Hispanic students, 8% Asian students, and 3% of students identified as 2 or more races. Of the sample, 8% were English Language Learners and 6% were students receiving special education. Three WE-CBM probes were administered for 7 min each at 3 time points across 1 year. Writing samples were scored for commonly used WE-CBM metrics (e.g., correct minus incorrect word sequences; CIWS). Results suggest that nearly half the variance in WE-CBM is related to unsystematic error and that conventional screening procedures (i.e., the use of one 3-min sample) do not yield scores with adequate reliability for relative or absolute decisions about student performance. In most grades, three 3-min writing samples (or 2 longer duration samples) were required for adequate reliability for relative decisions, and three 7-min writing samples would not yield adequate reliability for relative decisions about within-year student growth. Implications and recommendations are discussed. (PsycINFO Database Record
(c) 2016 APA, all rights reserved).