Background: Many risk factors have emerged for novel 2019 coronavirus disease (COVID-19). It is relatively unknown how these factors collectively predict COVID-19 infection risk, as well as risk for a severe infection (i.e., hospitalization).
Methods: Among aged adults (69.3 ± 8.6 years) in UK Biobank, COVID-19 data was downloaded for 4,510 participants with 7,539 test cases. We downloaded baseline data from 10-14 years ago, including demographics, biochemistry, body mass, and other factors, as well as antibody titers for 20 common to rare infectious diseases. Permutation-based linear discriminant analysis was used to predict COVID-19 risk and hospitalization risk. Probability and threshold metrics included receiver operating characteristic curves to derive area under the curve (AUC), specificity, sensitivity, and quadratic mean.
Results: The "best-fit" model for predicting COVID-19 risk achieved excellent discrimination (AUC=0.969, 95% CI=0.934-1.000). Factors included age, immune markers, lipids, and serology titers to common pathogens like human cytomegalovirus. The hospitalization "best-fit" model was more modest (AUC=0.803, 95% CI=0.663-0.943) and included only serology titers.
Conclusions: Accurate risk profiles can be created using standard self-report and biomedical data collected in public health and medical settings. It is also worthwhile to further investigate if prior host immunity predicts current host immunity to COVID-19.
Keywords: COVID-19; SARS-CoV-2; antibodies; cohort study; epidemiology; host response; linear discriminant analysis; machine learning; non-parametric; serology.