A General Class of Signed Rank Tests for Clustered Data when the Cluster Size is Potentially Informative

J Nonparametr Stat. 2012 Sep 1;24(3):797-808. doi: 10.1080/10485252.2012.672647. Epub 2012 May 1.

Abstract

Rank based tests are alternatives to likelihood based tests popularized by their relative robustness and underlying elegant mathematical theory. There has been a serge in research activities in this area in recent years since a number of researchers are working to develop and extend rank based procedures to clustered dependent data which include situations with known correlation structures (e.g., as in mixed effects models) as well as more general form of dependence.The purpose of this paper is to test the symmetry of a marginal distribution under clustered data. However, unlike most other papers in the area, we consider the possibility that the cluster size is a random variable whose distribution is dependent on the distribution of the variable of interest within a cluster. This situation typically arises when the clusters are defined in a natural way (e.g., not controlled by the experimenter or statistician) and in which the size of the cluster may carry information about the distribution of data values within a cluster.Under the scenario of an informative cluster size, attempts to use some form of variance adjusted sign or signed rank tests would fail since they would not maintain the correct size under the distribution of marginal symmetry. To overcome this difficulty Datta and Satten (2008; Biometrics, 64, 501-507) proposed a Wilcoxon type signed rank test based on the principle of within cluster resampling. In this paper we study this problem in more generality by introducing a class of valid tests employing a general score function. Asymptotic null distribution of these tests is obtained. A simulation study shows that a more general choice of the score function can sometimes result in greater power than the Datta and Satten test; furthermore, this development offers the user a wider choice. We illustrate our tests using a real data example on spinal cord injury patients.