Between- and Within-Cluster Spearman Rank Correlations

Stat Med. 2025 Feb 10;44(3-4):e10326. doi: 10.1002/sim.10326.

Abstract

Clustered data are common in practice. Clustering arises when subjects are measured repeatedly, or subjects are nested in groups (e.g., households, schools). It is often of interest to evaluate the correlation between two variables with clustered data. There are three commonly used Pearson correlation coefficients (total, between-, and within-cluster), which together provide an enriched perspective of the correlation. However, these Pearson correlation coefficients are sensitive to extreme values and skewed distributions. They also vary with data transformation, which is arbitrary and often difficult to choose, and they are not applicable to ordered categorical data. Current nonparametric correlation measures for clustered data are only for the total correlation. Here we define population parameters for the between- and within-cluster Spearman rank correlations. The definitions are natural extensions of the Pearson between- and within-cluster correlations to the rank scale. We show that the total Spearman rank correlation approximates a linear combination of the between- and within-cluster Spearman rank correlations, where the weights are functions of rank intraclass correlations of the two random variables. We also discuss the equivalence between the within-cluster Spearman rank correlation and the covariate-adjusted partial Spearman rank correlation. Furthermore, we describe estimation and inference for the three Spearman rank correlations, conduct simulations to evaluate the performance of our estimators, and illustrate their use with data from a longitudinal biomarker study and a clustered randomized trial.

Keywords: clustered data; nonparametric correlation measures; rank association measures.

MeSH terms

  • Cluster Analysis
  • Computer Simulation
  • Data Interpretation, Statistical
  • Humans
  • Models, Statistical*
  • Statistics, Nonparametric