Background: Clustered data with binary outcomes are often analysed using random intercepts models or generalised estimating equations (GEE) resulting in cluster-specific or 'population-average' inference, respectively.
Methods: When a random effects model is fitted to clustered data, predictions may be produced for a member of an existing cluster by using estimates of the fixed effects (regression coefficients) and the random effect for the cluster (conditional risk calculation), or for a member of a new cluster (marginal risk calculation). We focus on the second. Marginal risk calculation from a random effects model is obtained by integrating over the distribution of random effects. However, in practice marginal risks are often obtained, incorrectly, using only estimates of the fixed effects (i.e. by effectively setting the random effects to zero). We compare these two approaches to marginal risk calculation in terms of model calibration.
Results: In simulation studies, it has been seen that use of the incorrect marginal risk calculation from random effects models results in poorly calibrated overall marginal predictions (calibration slope <1 and calibration in the large ≠ 0) with mis-calibration becoming worse with higher degrees of clustering. We clarify that this was due to the incorrect calculation of marginal predictions from a random intercepts model and explain intuitively why this approach is incorrect. We show via simulation that the correct calculation of marginal risks from a random intercepts model results in predictions with excellent calibration.
Conclusion: The logistic random intercepts model can be used to obtain valid marginal predictions by integrating over the distribution of random effects.