Data in medical sciences often have a hierarchical structure with lower level units (e.g. children) nested in higher level units (e.g. departments). Several specific but frequently studied settings, mainly in longitudinal and family research, involve a large number of units that tend to be quite small, with units containing only one element referred to as singletons. Regardless of sparseness, hierarchical data should be analyzed with appropriate methodology such as, for example linear-mixed models. Using a simulation study, based on the structure of a data example on Ceftriaxone consumption in hospitalized children, we assess the impact of an increasing proportion of singletons (0-95%), in data with a low, medium, or high intracluster correlation, on the stability of linear-mixed models parameter estimates, confidence interval coverage and F test performance. Some techniques that are frequently used in the presence of singletons include ignoring clustering, dropping the singletons from the analysis and grouping the singletons into an artificial unit. We show that both the fixed and random effects estimates and their standard errors are stable in the presence of an increasing proportion of singletons. We demonstrate that ignoring clustering and dropping singletons should be avoided as they come with biased standard error estimates. Grouping the singletons into an artificial unit might be considered, although the linear-mixed model performs better even when the proportion of singletons is high. We conclude that the linear-mixed model is stable in the presence of singletons when both lower- and higher level sample sizes are fixed. In this setting, the use of remedial measures, such as ignoring clustering and grouping or removing singletons, should be dissuaded.
Keywords: F test; hierarchical data; intracluster correlation; performance characteristics; sparseness.
© 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.