Objective: To improve the calibration of logistic regression (LR) estimates using local information.
Background: Individualized risk assessment tools are increasingly being utilized. External validation of these tools often reveals poor model calibration.
Methods: We combine a clustering algorithm with an LR model to produce probability estimates that are close to the true probabilities for a particular case. The new method is compared to a standard LR model in terms of calibration, as measured by the sum of absolute differences (SAD) between model estimates and true probabilities, and discrimination, as measured by area under the ROC curve (AUC).
Results: We evaluate the new method on two synthetic data sets. SADs are significantly lower (p < 0.0001) in both data sets, and AUCs are significantly higher in one data set (p < 0.01).
Conclusion: The results suggest that the proposed method may be useful to improve the calibration of LR models.