StringFix: an annotation-guided transcriptome assembler improves the recovery of amino acid sequences from RNA-Seq reads

Genes Genomics. 2023 Dec;45(12):1599-1609. doi: 10.1007/s13258-023-01458-7. Epub 2023 Oct 14.

Abstract

Background: Reconstruction of amino acid sequences from assembled transcriptome is of interest in personalized medicine, for example, to predict drug-target (or protein-protein) interaction considering individual's genomic variations. Most of the existing transcriptome assemblers, however, seems not well suited for this purpose.

Methods: In this work, we present StringFix, an annotation guided transcriptome assembly and protein sequence reconstruction software tool that takes genome-aligned reads and the annotations associated to the reference genome as input. The tool 'fixes' the pre-annotated transcript sequence by taking small variations into account, finally to produce possible amino acid sequences that are likely to exist in the test tissue.

Results: The results show that, using outputs from existing reference-based assemblers as the input GTF-guide, StringFix could reconstruct amino acid sequences more precisely with higher sensitivity than direct generation using the recovered transcripts from all the assemblers we tested.

Conclusion: By using StringFix with the existing reference-based assemblers, one can recover not only a novel transcripts and isoforms but also the possible amino acid sequence stemming from them.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • RNA-Seq
  • Sequence Analysis, RNA / methods
  • Software*
  • Transcriptome*