Semblans: Automated assembly and processing of RNA-Seq data

Bioinformatics. 2025 Jan 9:btaf003. doi: 10.1093/bioinformatics/btaf003. Online ahead of print.

Abstract

Motivation: Recent advancements in parallel sequencing methods have precipitated a surge in publicly available short-read sequence data. This has encouraged the development of novel computational tools for the de novo assembly of transcriptomes from RNA-seq data. Despite the availability of these tools, performing an end-to-end transcriptome assembly remains a programmatically involved task necessitating familiarity with best practices. Aside from quality control steps, including error correction, adapter trimming, and chimera filtration needing to be correctly employed, moving data between programs often requires manual reformatting or restructuring, which can further impede throughput. Here, we introduce Semblans, a tool for streamlining the assembly process that efficiently and consistently produces high-quality transcriptome assemblies.

Results: Semblans abstracts the key quality control, reconstitution, and postprocessing steps of transcriptome assembly from raw short-read sequences to annotated coding sequences. Evaluating its performance against previously assembled transcriptomes on the basis of assembly quality, we find that Semblans produced higher quality assemblies for 98 of the 101 short-read runs tested.

Availability: Semblans is written in C ++ and runs on Unix-compliant operating systems. Source code, documentation, and compiled binaries are hosted under the GNU General Public License at https://github.com/gladshire/Semblans.

Supplementary information: Supplementary data are available at Journal Name online.