Multidimensional Machine Learning for Assessing Parameters Associated With COVID-19 in Vietnam: Validation Study

JMIR Form Res. 2023 Feb 16:7:e42895. doi: 10.2196/42895.

Abstract

Background: Machine learning (ML) is a type of artificial intelligence strategy. Its algorithms are used on big data sets to see patterns, learn from their results, and perform tasks autonomously without being instructed on how to address problems. New diseases like COVID-19 provide important data for ML. Therefore, all relevant parameters should be explicitly quantified and modeled.

Objective: The purpose of this study was to determine (1) the overall preclinical characteristics, (2) the cumulative cutoff values and risk ratios (RRs), and (3) the factors associated with COVID-19 severity in unidimensional and multidimensional analyses involving 2173 SARS-CoV-2 patients.

Methods: The study population consisted of 2173 patients (1587 mild status [mild group] and asymptomatic patients, 377 moderate status patients [moderate group], and 209 severe status patients [severe group]). The status of the patients was recorded from September 2021 to March 2022. Two correlation tests, relative risk, and RR were used to eliminate unbalanced parameters and select the most remarkable parameters. The independent methods of hierarchical cluster analysis and k-means were used to classify parameters according to their r values. Finally, network analysis provided a 3-dimensional view of the results.

Results: COVID-19 severity was significantly correlated with age (mild-moderate group: RR 4.19, 95% CI 3.58-4.95; P<.001), scoring index of chest x-ray (mild-moderate group: RR 3.29, 95% CI 2.76-3.92; P<.001; moderate-severe group: RR 3.03, 95% CI 2.4023-3.8314; P<.001), percentage of neutrophils (mild-moderate group: RR 3.18, 95% CI 2.73-3.70; P<.001; moderate-severe group: RR 3.32, 95% CI 2.6480-4.1529; P<.001), quantity of neutrophils (moderate-severe group: RR 3.15, 95% CI 2.6153-3.8025; P<.001), albumin (moderate-severe group: RR 0.46, 95% CI 0.3650-0.5752; P<.001), C-reactive protein (mild-moderate group: RR 3.4, 95% CI 2.91-3.97; P<.001), and ratio of lymphocytes (moderate-severe group: RR 0.34, 95% CI 0.2743-0.4210; P<.001). Significant inversion of correlations among the severity groups is important. Alanine transaminase and leucocytes showed a significant negative correlation (r=-1; P<.001) in the mild group and a significant positive correlation in the moderate group (r=1; P<.001). Transferrin and anion Cl showed a significant positive correlation (r=1; P<.001) in the mild group and a significant negative correlation in the moderate group (r=-0.59; P<.001). The clustering and network analysis showed that in the mild-moderate group, the closest neighbors of COVID-19 severity were ferritin and age. C-reactive protein, scoring index of chest x-ray, albumin, and lactate dehydrogenase were the next closest neighbors of these 3 factors. In the moderate-severe group, the closest neighbors of COVID-19 severity were ferritin, fibrinogen, albumin, quantity of lymphocytes, scoring index of chest x-ray, white blood cell count, lactate dehydrogenase, and quantity of neutrophils.

Conclusions: This multidimensional study in Vietnam showed possible correlations between several elements and COVID-19 severity to provide clinical reference markers for surveillance and diagnostic management.

Keywords: C-reactive protein; COVID-19; age; albumin; hierarchical cluster analysis; mild; moderate; multidimensional analysis; percentage and quantity of neutrophils; ratio of lymphocytes; regression analysis; scoring index of chest x-ray; severe.