Perceptual rating scales can be valid, reliable, and convenient tools for evaluating speech outcomes in research and clinical practice. However, they depend on the perceptions of observers. Too few raters may compromise accuracy, whereas too many would be inefficient. There is therefore a need to determine the minimum number of raters required for a reliable result. In this context, the ideas of Generalizability Theory have become increasingly popular in the behavioral sciences; suggestions have been made for their application to the assessment of speech-language disorders. Here we review the concepts involved, which are applied in a companion article dealing with speech naturalness data obtained from clients who recently completed treatment for their stuttering. We pay particular attention to the statistical requirements of the theory, including some cautions about possible inappropriate use of these techniques. We also offer a new interpretation of the results of the analysis that aims to be more meaningful to most speech-language pathologists.