Cov-trans: an efficient algorithm for discontinuous transcript assembly in coronaviruses

BMC Genomics. 2024 Dec 30;25(1):1257. doi: 10.1186/s12864-024-11179-0.

Abstract

Background: Discontinuous transcription allows coronaviruses to efficiently replicate and transmit within host cells, enhancing their adaptability and survival. Assembling viral transcripts is crucial for virology research and the development of antiviral strategies. However, traditional transcript assembly methods primarily designed for variable alternative splicing events in eukaryotes are not suitable for the viral transcript assembly problem. The current algorithms designed for assembling viral transcripts often struggle with low accuracy in determining the transcript boundaries. There is an urgent need to develop a highly accurate viral transcript assembly algorithm.

Results: In this work, we propose Cov-trans, a reference-based transcript assembler specifically tailored for the discontinuous transcription of coronaviruses. Cov-trans first identifies canonical transcripts based on discontinuous transcription mechanisms, start and stop codons, as well as reads alignment information. Subsequently, it formulates the assembly of non-canonical transcripts as a path extraction problem, and introduces a mixed integer linear programming to recover these non-canonical transcripts.

Conclusion: Experimental results show that Cov-trans outperforms other assemblers in both accuracy and recall, with a notable strength in accurately identifying the boundaries of transcripts. Cov-trans is freely available at https://github.com/computer-Bioinfo/Cov-trans.git .

Keywords: Coronaviruses; Discontinuous transcription; Mixed integer linear programming; Referenced-based assembly.

MeSH terms

  • Algorithms*
  • Computational Biology / methods
  • Coronavirus* / genetics
  • RNA, Viral / genetics
  • RNA, Viral / metabolism
  • Sequence Analysis, RNA / methods
  • Software
  • Transcription, Genetic

Substances

  • RNA, Viral