VERSO: A comprehensive framework for the inference of robust phylogenies and the quantification of intra-host genomic diversity of viral samples

Patterns (N Y). 2021 Mar 12;2(3):100212. doi: 10.1016/j.patter.2021.100212. Epub 2021 Jan 28.

Abstract

We introduce VERSO, a two-step framework for the characterization of viral evolution from sequencing data of viral genomes, which is an improvement on phylogenomic approaches for consensus sequences. VERSO exploits an efficient algorithmic strategy to return robust phylogenies from clonal variant profiles, also in conditions of sampling limitations. It then leverages variant frequency patterns to characterize the intra-host genomic diversity of samples, revealing undetected infection chains and pinpointing variants likely involved in homoplasies. On simulations, VERSO outperforms state-of-the-art tools for phylogenetic inference. Notably, the application to 6,726 amplicon and RNA sequencing samples refines the estimation of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) evolution, while co-occurrence patterns of minor variants unveil undetected infection paths, which are validated with contact tracing data. Finally, the analysis of SARS-CoV-2 mutational landscape uncovers a temporal increase of overall genomic diversity and highlights variants transiting from minor to clonal state and homoplastic variants, some of which fall on the spike gene. Available at: https://github.com/BIMIB-DISCo/VERSO.

Keywords: COVID-19; SARS-CoV-2; genomic surveillance; intra-host genomic diversity; phylogenomics; viral evolution; viral variants.