Objective: To assess whether customized versions of the Simplified Acute Physiology Score (SAPS) II and the Mortality Probability Model (MPM) II0 agree on the identity of intensive care unit quality outliers within a multiple-center database.
Design: Retrospective database analysis.
Setting and patients: Patient subset of the Project IMPACT database consisting of 39,617 adult patients admitted to surgical, medical, and mixed surgical-medical intensive care units at 54 hospitals between 1995 and 1999 who met inclusion criteria for SAPS II and MPM II0.
Interventions: Customized versions of SAPS II and MPM II0 were obtained by fitting new logistic regressions to the data by using the risk score as the independent variable and outcome at hospital discharge as the dependent variable. The data set was divided randomly into a training set and a validation set. Each model was customized by using the training set; model performance was then assessed in the validation set by using the area under the receiver operating characteristic curve and the Hosmer-Lemeshow statistic. The final models were based on the entire data set. The level of agreement between the customized models on the identity of quality outliers was evaluated by using kappa analysis.
Measurements and main results: Both customized models exhibited good discrimination and good calibration in this database. The area under the receiver operating characteristic curve was 0.83 for MPM II0 and 0.872 for SAPS II following model customization. The Hosmer-Lemeshow statistic was 12.3 ( >.14) for MPM II0, and 8.17 (p >.42) for SAPS II, after customization. Kappa analysis showed only fair agreement between the two customized models with regard to the identity of the quality outliers: kappa = 0.44 (95% confidence interval, 0.24, 0.65).
Conclusions: Customization of SAPS II and MPM II0 to the Project IMPACT database resulted in well-calibrated models. Despite this, the models exhibited only a moderate level of agreement in which hospitals were designated as quality outliers. Seventeen of the 54 hospitals were categorized differently depending on which of the two scoring systems was used. Therefore, the rating of quality of care appears, in part, to be a function of the prediction model used.