Diagnostics of viral infections using high-throughput genome sequencing data

Brief Bioinform. 2024 Sep 23;25(6):bbae501. doi: 10.1093/bib/bbae501.

Abstract

Plant viral infections cause significant economic losses, totalling $350 billion USD in 2021. With no treatment for virus-infected plants, accurate and efficient diagnosis is crucial to preventing and controlling these diseases. High-throughput sequencing (HTS) enables cost-efficient identification of known and unknown viruses. However, existing diagnostic pipelines face challenges. First, many methods depend on subjectively chosen parameter values, undermining their robustness across various data sources. Second, artifacts (e.g. false peaks) in the mapped sequence data can lead to incorrect diagnostic results. While some methods require manual or subjective verification to address these artifacts, others overlook them entirely, affecting the overall method performance and leading to imprecise or labour-intensive outcomes. To address these challenges, we introduce IIMI, a new automated analysis pipeline using machine learning to diagnose infections from 1583 plant viruses with HTS data. It adopts a data-driven approach for parameter selection, reducing subjectivity, and automatically filters out regions affected by artifacts, thus improving accuracy. Testing with in-house and published data shows IIMI's superiority over existing methods. Besides a prediction model, IIMI also provides resources on plant virus genomes, including annotations of regions prone to artifacts. The method is available as an R package (iimi) on CRAN and will integrate with the web application www.virtool.ca, enhancing accessibility and user convenience.

Keywords: artifacts in genomic mapping; clean plant program; genome mappability; machine learning; read mapping; virus diagnosis.

MeSH terms

  • Computational Biology / methods
  • Genome, Viral
  • High-Throughput Nucleotide Sequencing* / methods
  • Machine Learning
  • Plant Diseases / virology
  • Plant Viruses / genetics
  • Software
  • Virus Diseases / diagnosis
  • Virus Diseases / virology