Computational identification of differentially-expressed genes as suggested novel COVID-19 biomarkers: A bioinformatics analysis of expression profiles

Comput Struct Biotechnol J. 2023:21:3339-3354. doi: 10.1016/j.csbj.2023.06.007. Epub 2023 Jun 12.

Abstract

COVID-19 was declared a pandemic in March 2020, and since then, it has not stopped spreading like wildfire in almost every corner of the world, despite the many efforts made to stem its spread. SARS-CoV-2 has one of the biggest genomes among RNA viruses and presents unique characteristics that differentiate it from other coronaviruses, making it even more challenging to find a cure or vaccine that is efficient enough. This work aims, using RNA sequencing (RNA-Seq) data, to evaluate whether the expression of specific human genes in the host can vary in different grades of disease severity and to determine the molecular origins of the differences in response to SARS-CoV-2 infection in different patients. In addition to quantifying gene expression, data coming from RNA-Seq allow for the discovery of new transcripts, the identification of alternative splicing events, the detection of allele-specific expression, and the detection of post-transcriptional alterations. For this reason, we performed differential expression analysis on different expression profiles of COVID-19 patients, using RNA-Seq data coming from NCBI public repository, and we obtained the lists of all differentially expressed genes (DEGs) emerging from 7 experimental conditions. We performed a Gene Set Enrichment Analysis (GSEA) on these genes to find possible correlations between DEGs and known disease phenotypes. We mainly focused on DEGs coming out from the analysis of the contrasts involving severe conditions to infer any possible relation between a worsening of the clinical picture and an over-representation of specific genes. Based on the obtained results, this study indicates a small group of genes that result up-regulated in the severe form of the disease. EXOSC5, MESD, REXO2, and TRMT2A genes are not differentially expressed or not present in the other conditions, being for that reason, good biomarkers candidates for the severe form of COVID-19 disease. The use of specific over-expressed genes, whether up-regulated or down-regulated, which have an individual role in each different condition of COVID-19 as a biomarker, can assist in early diagnosis.

Keywords: Biomarker; COVID-19; Differential expression analysis; Gene set enrichment analysis; RNA-Sequencing; SARS-CoV-2.