A mixture model for determining SARS-Cov-2 variant composition in pooled samples

Bioinformatics. 2022 Mar 28;38(7):1809-1815. doi: 10.1093/bioinformatics/btac047.

Abstract

Motivation: Despite of the fast development of highly effective vaccines to control the current COVID-19 pandemics, the unequal distribution and availability of these vaccines worldwide and the number of people infected in the world lead to the continuous emergence of Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2) variants of concern. Therefore, it is likely that real-time genomic surveillance will be continuously needed as an unceasing monitoring tool, necessary to follow the spread of the disease and the evolution of the virus. In this context, new genomic variants of SARS-CoV-2, including variants refractory to current vaccines, makes genomic surveillance programs tools of utmost importance. Nevertheless, the lack of appropriate analytical tools to quickly and effectively access the viral composition in meta-transcriptomic sequencing data, including environmental surveillance, represent possible challenges that may impact the fast adoption of this approach to mitigate the spread and transmission of viruses.

Results: We propose a statistical model for the estimation of the relative frequencies of SARS-CoV-2 variants in pooled samples. This model is built by considering a previously defined selection of genomic polymorphisms that characterize SARS-CoV-2 variants. The methods described here support both raw sequencing reads for polymorphisms-based markers calling and predefined markers in the variant call format. Results obtained using simulated data show that our method is quite effective in recovering the correct variant proportions. Further, results obtained by considering longitudinal data from wastewater samples of two locations in Switzerland agree well with those describing the epidemiological evolution of COVID-19 variants in clinical samples of these locations. Our results show that the described method can be a valuable tool for tracking the proportions of SARS-CoV-2 variants in complex mixtures such as waste water and environmental samples.

Availability and implementation: http://github.com/rvalieris/LCS.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19*
  • Gene Expression Profiling
  • Genomics
  • Humans
  • SARS-CoV-2* / genetics

Supplementary concepts

  • SARS-CoV-2 variants