Detecting virus integration sites based on multiple related sequencing data by VirTect

BMC Med Genomics. 2019 Jan 31;12(Suppl 1):19. doi: 10.1186/s12920-018-0461-8.

Abstract

Background: Since tumor often has a high level of intra-tumor heterogeneity, multiple tumor samples from the same patient at different locations or different time points are often sequenced to study tumor intra-heterogeneity or tumor evolution. In virus-related tumors such as human papillomavirus- and Hepatitis B Virus-related tumors, virus genome integrations can be critical driving events. It is thus important to investigate the integration sites of the virus genomes. Currently, a few algorithms for detecting virus integration sites based on high-throughput sequencing have been developed, but their insufficient performance in their sensitivity, specificity and computational complexity hinders their applications in multiple related tumor sequencing.

Results: We develop VirTect for detecting virus integration sites simultaneously from multiple related-sample data. This algorithm is mainly based on the joint analysis of short reads spanning breakpoints of integration sites from multiple samples. To achieve high specificity and breakpoint accuracy, a local precise sandwich alignment algorithm is used. Simulation and real data analyses show that, compared with other algorithms, VirTect is significantly more sensitive and has a similar or lower false discovery rate.

Conclusions: VirTect can provide more accurate breakpoint position and is computationally much more efficient in terms both memory requirement and computational time.

Keywords: HBV; HPV; Hidden Markov model; Paired-end reads; Split reads.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Genome, Human / genetics
  • Genomics / methods*
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Virus Integration*
  • Workflow