Prediction of Disease Progression of COVID-19 Based upon Machine Learning

Int J Gen Med. 2021 Apr 29:14:1589-1598. doi: 10.2147/IJGM.S294872. eCollection 2021.

Abstract

Background: Since December 2019, COVID-19 has spread throughout the world. Clinical outcomes of COVID-19 patients vary among infected individuals. Therefore, it is vital to identify patients at high risk of disease progression.

Methods: In this retrospective, multicenter cohort study, COVID-19 patients from Huoshenshan Hospital and Taikang Tongji Hospital (Wuhan, China) were included. Clinical features showing significant differences between the severe and nonsevere groups were screened out by univariate analysis. Then, these features were used to generate classifier models to predict whether a COVID-19 case would be severe or nonsevere based on machine learning. Two test sets of data from the two hospitals were gathered to evaluate the predictive performance of the models.

Results: A total of 455 patients were included, and 21 features showing significant differences between the severe and nonsevere groups were selected for the training and validation set. The optimal subset, with eleven features in the k-nearest neighbor model, obtained the highest area under the curve (AUC) value among the four models in the validation set. D-dimer, CRP, and age were the three most important features in the optimal-feature subsets. The highest AUC value was obtained using a support vector-machine model for a test set from Huoshenshan Hospital. Software for predicting disease progression based on machine learning was developed.

Conclusion: The predictive models were successfully established based on machine learning, and achieved satisfactory predictive performance of disease progression with optimal-feature subsets.

Keywords: COVID-19; disease progression; machine-learning models.

Grants and funding

This work was supported by the National Natural Science Foundation of China (81700483), Chongqing Research Program of Basic Research and Frontier Technology (cstc2017jcyjAX0302, cstc2020jcyj-msxmX1100), and Army Medical University Frontier Technology Research Program (2019XLC3051). The funders of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report. The corresponding authors had full access to all the data in the study, and had final responsibility for the decision to submit for publication.