Cov2clusters: genomic clustering of SARS-CoV-2 sequences

BMC Genomics. 2022 Oct 19;23(1):710. doi: 10.1186/s12864-022-08936-4.

Abstract

Background: The COVID-19 pandemic remains a global public health concern. Advances in sequencing technologies has allowed for high numbers of SARS-CoV-2 whole genome sequence (WGS) data and rapid sharing of sequences through global repositories to enable almost real-time genomic analysis of the pathogen. WGS data has been used previously to group genetically similar viral pathogens to reveal evidence of transmission, including methods that identify distinct clusters on a phylogenetic tree. Identifying clusters of linked cases can aid in the regional surveillance and management of the disease. In this study, we present a novel method for producing stable genomic clusters of SARS-CoV-2 cases, cov2clusters, and compare the accuracy and stability of our approach to previous methods used for phylogenetic clustering using real-world SARS-CoV-2 sequence data obtained from British Columbia, Canada.

Results: We found that cov2clusters produced more stable clusters than previously used phylogenetic clustering methods when adding sequence data through time, mimicking an increase in sequence data through the pandemic. Our method also showed high accuracy when predicting epidemiologically informed clusters from sequence data.

Conclusions: Our new approach allows for the identification of stable clusters of SARS-CoV-2 from WGS data. Producing high-resolution SARS-CoV-2 clusters from sequence data alone can a challenge and, where possible, both genomic and epidemiological data should be used in combination.

Keywords: Bioinformatics; Public health; SARS-CoV-2; Whole genome sequencing.

MeSH terms

  • COVID-19* / epidemiology
  • Cluster Analysis
  • Genome, Viral
  • Genomics
  • Humans
  • Pandemics
  • Phylogeny
  • SARS-CoV-2* / genetics