Objective: To evaluate whether different categorization strategies for introducing continuous variables in multivariable logistic regression analysis results in prognostic models that differ in content and performance.
Study design and setting: Backward multivariable logistic regression (P<0.05 and P<0.157) was performed with possible predictors for persistent complaints in patients with nonspecific neck pain. The continuous variables were introduced in the analysis in three separate ways: (1) continuous, (2) split into multiple categories, and (3) dichotomized. The different models were compared with regard to model content, goodness of fit, explained variation, and discriminative ability. We also compared the effect on performance of categorization before and after the selection procedure.
Results: For P<0.05, the final model with continuous variables, containing five predictors, disagreed on three predictors with both categorization strategies. For P<0.157, the model with continuous variables, containing six predictors, disagreed on three predictors with the model containing stratified continuous variables and on six predictors compared with the model with dichotomized variables. The models in which the variables were kept continuous performed best. There was no clear difference in performance between categorization before and after the selection procedure.
Conclusion: Categorization of continuous variables resulted in a different content and poorer performance of the final model.