With the implementation of the Food Quality Protection Act in 1996, more detailed evaluations of possible health effects of pesticides on developing organisms have been required. As a result, considerable developmental neurotoxicity (DNT) data have been generated on a variety of endpoints, including developmental changes in motor activity, auditory startle habituation, and various learning and memory parameters. One issue in interpreting these data is the level of variability for the measures used in these studies: excessive variability can obscure treatment-related effects, or conversely, small but statistically significant changes could be viewed as treatment related, when they might in fact be within the normal range. To aid laboratories in designing useful DNT studies for regulatory consideration, an operational framework for evaluating observed variability in study data has been developed. Elements of the framework suggest how an investigator might approach characterization of variability in the dataset; identification of appropriate datasets for comparison; evaluation of similarities and differences in variability between these datasets, and of possible sources of the variability, including those related to test conduct and test design. A case study using auditory startle habituation data is then presented, employing the elements of this proposed approach.