Background: Fusion genes are important biomarkers in cancer research because their expression can produce abnormal proteins with oncogenic properties. Long-read RNA sequencing (long-read RNA-seq), which can sequence full-length mRNA transcripts, facilitates the detection of such fusion genes. Several tools have been proposed for detecting fusion genes in long-read RNA-seq datasets derived from cancer cells. However, the high sequencing error rate in long-read RNA-seq makes fusion gene detection challenging.
Methods: To address this issue, additional steps were incorporated into the fusion detection tool to improve detection accuracy. These steps include anchoring breakpoints to exon boundaries, realigning unaligned regions, and clustering breakpoints. To evaluate the accuracy of our tool in detecting fusion genes, we compared its detection accuracy with two representative existing tools, JAFFAL and FusionSeeker.
Results: Our tool outperformed the two existing tools in detecting fusion genes, as demonstrated in long-read RNA-seq datasets. We also identified potentially novel fusion genes consistently detected across multiple tools or datasets.
Conclusions: The application of our tool to the detection of fusion genes in long-read RNA-seq datasets from two different cancer cell lines demonstrated the detection effectiveness of this tool.
Keywords: RNA sequencing; fusion gene; long-read sequencing.
© 2024 The Author(s). Published by IMR Press.