The ongoing COVID-19 pandemic has not only globally caused a high number of causalities, but is also an unprecedented challenge for scientists. False-positive virus detection tests not only aggravate the situation in the healthcare sector, but also provide ground for speculations. Previous studies have highlighted the importance of software choice and data interpretation in virome studies. We aimed to further expand theoretical and practical knowledge in bioinformatics-driven virome studies by focusing on short, virus-like DNA sequences in metagenomic data. Analyses of datasets obtained from different sample types (terrestrial, animal and human related samples) and origins showed that coronavirus-like sequences have existed in host-associated and environmental samples before the current COVID-19 pandemic. In the analyzed datasets, various Betacoronavirus-like sequences were detected that also included SARS-CoV-2 matches. Deepening analyses indicated that the detected sequences are not of viral origin and thus should not be considered in virome profiling approaches. Our study confirms the importance of parameter selection, especially in terms of read length, for reliable virome profiling. Natural environments are an important source of coronavirus-like nucleotide sequences that should be taken into account when virome datasets are analyzed and interpreted. We therefore suggest that processing parameters are carefully selected for SARS-CoV-2 profiling in host related as well as environmental samples in order to avoid incorrect identifications.
Keywords: COVID-19; Coronaviruses; Metagenomics; SARS-CoV-2; Virome profiling.
Copyright © 2021 The Authors. Published by Elsevier B.V. All rights reserved.