Uncovering 1058 Novel Human Enteric DNA Viruses Through Deep Long-Read Third-Generation Sequencing and Their Clinical Impact

Gastroenterology. 2022 Sep;163(3):699-711. doi: 10.1053/j.gastro.2022.05.048. Epub 2022 Jun 6.

Abstract

Background & aims: Lack of viral reference genomes poses a challenge to virome study. We investigated human gut virome and its clinical implication by ultra-deep metagenomic sequencing.

Methods: We extracted sufficient viral DNA from human feces for ultra-deep PacBio sequencing (>10 μg) and Illumina sequencing (>1 μg). Upon de novo assembly and 6 stages of strict filtering, viral genomes were generated and validated in 3 cohorts of 2819 published fecal metagenomes. Diagnostic performance of assembled viruses for colorectal cancer were tested in a training cohort and 2 independent validation cohorts. Virus mapping ratio, evolutionary history, and virus status (lytic or temperate) were also examined.

Results: The mean amount of extracted viral DNA increased by 14-fold compared with previous protocols. We obtained PacBio long reads and Illumina short reads with 290-fold higher depth than previous studies. We assembled and validated 1178 contigs as complete viral genomes, of which 1058 were newly identified. Thirteen viral genomes (398-839 kb) that are longer than the largest bacteriophage found in humans (393 kb) were discovered. Phylogenetic tree was constructed based on Hidden Markov Models alignment scores of 4 conserved viral proteins. Incorporating our assembled genomes into the National Center for Biotechnology Information database improved the mapping ratio of published metagenomes ≤18 times. Lytic viruses (75.9% ± 12.2% of total) were predominantly present in our sample. A biomarker panel of 14 novel viruses could discriminate patients with colorectal cancer from controls with an area under the receiver operating characteristics curve of 0.87 in the training cohort, which was validated with areas under the receiver operating characteristics curve of 0.85 and 0.73 in 2 independent cohorts.

Conclusions: We uncovered 1058 novel human gut viruses. These findings can contribute to clinical diagnosis, current viral reference genome, and future virome investigation.

Keywords: Colorectal Cancer; Diagnostic Biomarker; Gut Virome; PacBio Sequencing; Ultra-Deep Metagenomic Sequencing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Colorectal Neoplasms / genetics
  • DNA Viruses / genetics
  • DNA, Viral / genetics
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Metagenome
  • Metagenomics / methods
  • Phylogeny
  • Viruses* / genetics

Substances

  • DNA, Viral