VirusRecom: an information-theory-based method for recombination detection of viral lineages and its application on SARS-CoV-2

Brief Bioinform. 2023 Jan 19;24(1):bbac513. doi: 10.1093/bib/bbac513.

Abstract

Genomic recombination is an important driving force for viral evolution, and recombination events have been reported for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) during the Coronavirus Disease 2019 pandemic, which significantly alter viral infectivity and transmissibility. However, it is difficult to identify viral recombination, especially for low-divergence viruses such as SARS-CoV-2, since it is hard to distinguish recombination from in situ mutation. Herein, we applied information theory to viral recombination analysis and developed VirusRecom, a program for efficiently screening recombination events on viral genome. In principle, we considered a recombination event as a transmission process of ``information'' and introduced weighted information content (WIC) to quantify the contribution of recombination to a certain region on viral genome; then, we identified the recombination regions by comparing WICs of different regions. In the benchmark using simulated data, VirusRecom showed a good balance between precision and recall compared to two competing tools, RDP5 and 3SEQ. In the detection of SARS-CoV-2 XE, XD and XF recombinants, VirusRecom providing more accurate positions of recombination regions than RDP5 and 3SEQ. In addition, we encapsulated the VirusRecom program into a command-line-interface software for convenient operation by users. In summary, we developed a novel approach based on information theory to identify viral recombination within highly similar sequences, providing a useful tool for monitoring viral evolution and epidemic control.

Keywords: SARS-CoV-2; evolution; information theory; recombination.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19*
  • Humans
  • Information Theory
  • Phylogeny
  • Recombination, Genetic
  • SARS-CoV-2* / genetics