Purpose: To establish dynamic prediction models by machine learning using daily multidimensional data for coronavirus disease 2019 (COVID-19) patients.
Methods: Hospitalized COVID-19 patients at Peking Union Medical College Hospital from Nov 2nd, 2022, to Jan 13th, 2023, were enrolled in this study. The outcome was defined as deterioration or recovery of the patient's condition. Demographics, comorbidities, laboratory test results, vital signs, and treatments were used to train the model. To predict the following days, a separate XGBoost model was trained and validated. The Shapley additive explanations method was used to analyze feature importance.
Results: A total of 995 patients were enrolled, generating 7228 and 3170 observations for each prediction model. In the deterioration prediction model, the minimum area under the receiver operating characteristic curve (AUROC) for the following 7 days was 0.786 (95% CI 0.721-0.851), while the AUROC on the next day was 0.872 (0.831-0.913). In the recovery prediction model, the minimum AUROC for the following 3 days was 0.675 (0.583-0.767), while the AUROC on the next day was 0.823 (0.770-0.876). The top 5 features for deterioration prediction on the 7th day were disease course, length of hospital stay, hypertension, and diastolic blood pressure. Those for recovery prediction on the 3rd day were age, D-dimer levels, disease course, creatinine levels and corticosteroid therapy.
Conclusion: The models could accurately predict the dynamics of Omicron patients' conditions using daily multidimensional variables, revealing important features including comorbidities (e.g., hyperlipidemia), age, disease course, vital signs, D-dimer levels, corticosteroid therapy and oxygen therapy.
Keywords: COVID-19; Machine learning; Omicron; Prediction model.
© 2023 The Authors.