Background: A nested case-control (NCC) design within a prospective cohort study can realize substantial benefits for biomarker studies. In this context, it is natural to consider the sample availability in the selection of controls to minimize data loss when implementing the design. However, this violates the randomness required for selection, and it leads to biased analyses. An inverse probability weighting may improve the analysis, but the current approach using weighted Cox regression fails to maintain the benefits of NCC design.
Methods: This paper introduces weighted conditional logistic regression. We illustrate our proposed analysis using data recently investigated in The Environmental Determinants of Diabetes in the Young (TEDDY). Considering the potential data loss, the TEDDY NCC design was moderately selective in its selection of controls. A data-driven simulation study was performed to present the bias correction when a nonrandom control selection was ignored in the analysis.
Results: The TEDDY data analysis showed that the standard analysis using conditional logistic regression estimated the parameter: -0.015 (-0.023, -0.007). The biased estimate using Cox regression was -0.011 (95% confidence interval: -0.019, -0.003). Weighted Cox regression estimated -0.013 (-0.026, 0.0004). The proposed weighted conditional logistic regression estimated -0.020 (-0.033, -0.007), showing a stronger negative effect size than the one using conditional logistic regression. The simulation study also showed that the standard estimate of β ignoring the nonrandom control selection tends to be greater than the true β (ie, positive relative biases).
Conclusion: Weighted conditional logistic regression can enhance the analysis by offering flexibility in the selection of controls, while maintaining the matching.
Keywords: inverse probability weighting; nested case-control design; prospective cohort study; selection bias; weighted conditional logistic regression.
© 2019 John Wiley & Sons, Ltd.