Host genetics and COVID-19 severity: increasing the accuracy of latest severity scores by Boolean quantum features

Front Genet. 2024 May 22:15:1362469. doi: 10.3389/fgene.2024.1362469. eCollection 2024.

Abstract

The impact of common and rare variants in COVID-19 host genetics has been widely studied. In particular, in Fallerini et al. (Human genetics, 2022, 141, 147-173), common and rare variants were used to define an interpretable machine learning model for predicting COVID-19 severity. First, variants were converted into sets of Boolean features, depending on the absence or the presence of variants in each gene. An ensemble of LASSO logistic regression models was used to identify the most informative Boolean features with respect to the genetic bases of severity. After that, the Boolean features, selected by these logistic models, were combined into an Integrated PolyGenic Score (IPGS), which offers a very simple description of the contribution of host genetics in COVID-19 severity.. IPGS leads to an accuracy of 55%-60% on different cohorts, and, after a logistic regression with both IPGS and age as inputs, it leads to an accuracy of 75%. The goal of this paper is to improve the previous results, using not only the most informative Boolean features with respect to the genetic bases of severity but also the information on host organs involved in the disease. In this study, we generalize the IPGS adding a statistical weight for each organ, through the transformation of Boolean features into "Boolean quantum features," inspired by quantum mechanics. The organ coefficients were set via the application of the genetic algorithm PyGAD, and, after that, we defined two new integrated polygenic scores (IPGSph1 and IPGSph2). By applying a logistic regression with both IPGS, (IPGSph2 (or indifferently IPGSph1) and age as inputs, we reached an accuracy of 84%-86%, thus improving the results previously shown in Fallerini et al. (Human genetics, 2022, 141, 147-173) by a factor of 10%.

Keywords: COVID-19; genetic algorithm; genetic science modeling; host genetics; integrated polygenic score; logistic regression.

Grants and funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This study is part of the GEN-COVID Multicenter Study, (https://sites.google.com/dbm.unisi.it/gen-covid), the Italian multicenter study aimed at identifying the COVID-19 host genetic bases. Specimens were provided by the COVID-19 Biobank of Siena, which is part of the Genetic Biobank of Siena, member of BBMRI-IT, of the Telethon Network of Genetic Biobanks (project no. GTB18001), EuroBioBank, and RD-Connect. Some authors of this paper are members of the European Reference Network on rare respiratory diseases ERN-LUNG. We thank private donors for the support provided to AR(Department of Medical Biotechnologies, University of Siena) for the COVID-19 host genetics research project (D.L n.18 of 17 March 2020). We also thank the COVID-19 Host Genetics Initiative (https://www.covid19hg.org/), MIUR project “Dipartimenti di Eccellenza 2018–2020” to the Department of Medical Biotechnologies University of Siena, Italy, and “Bando Ricerca COVID-19 Toscana” project to Azienda Ospedaliero-Universitaria Senese. We thank Intesa San Paolo for the 2020 charity fund dedicated to the project N B/2020/0119 “Identificazione delle basi genetiche determinanti la variabilità clinica della risposta a COVID-19 nella popolazione italiana.” The Italian Ministry of University and Research for funding within the “Bando FISR 2020” in COVID-19 for the project “Editing dell’RNA contro il Sars-CoV-2: hackerare il virus per identificare bersagli molecolari e attenuare l’infezione - HACKTHECOV” and the Istituto Buddista Italiano Soka Gakkai for funding the project “PAT-COVID: Host genetics and pathogenetic mechanisms of COVID-19” (ID n. 2020–2016_RIC_3). We thank EU project H2020-SC1-FA-DTS-2018–2020, entitled “International consortium for integrative genomics prediction (INTERVENE)”—Grant Agreement No. 101016775. Generous support was also received from private donations by Mrs. Maurizio Traglio, Enzo Cattaneo, and Alberto Borella.