Gene Fusion Detection in Long-Read Transcriptome Datasets from Multiple Cancer Cell Lines

Keigo Masuda; Yoshiaki Sota; Hideo Matsuda

doi:10.31083/j.fbl2912413

Gene Fusion Detection in Long-Read Transcriptome Datasets from Multiple Cancer Cell Lines

Front Biosci (Landmark Ed). 2024 Dec 11;29(12):413. doi: 10.31083/j.fbl2912413.

Authors

Keigo Masuda¹, Yoshiaki Sota², Hideo Matsuda¹

Affiliations

¹ Graduate School of Information Science and Technology, Osaka University, 565-0871 Suita, Osaka, Japan.
² Graduate School of Medicine, Osaka University, 565-0871 Suita, Osaka, Japan.

PMID: 39735992
DOI: 10.31083/j.fbl2912413

Abstract

Background: Fusion genes are important biomarkers in cancer research because their expression can produce abnormal proteins with oncogenic properties. Long-read RNA sequencing (long-read RNA-seq), which can sequence full-length mRNA transcripts, facilitates the detection of such fusion genes. Several tools have been proposed for detecting fusion genes in long-read RNA-seq datasets derived from cancer cells. However, the high sequencing error rate in long-read RNA-seq makes fusion gene detection challenging.

Methods: To address this issue, additional steps were incorporated into the fusion detection tool to improve detection accuracy. These steps include anchoring breakpoints to exon boundaries, realigning unaligned regions, and clustering breakpoints. To evaluate the accuracy of our tool in detecting fusion genes, we compared its detection accuracy with two representative existing tools, JAFFAL and FusionSeeker.

Results: Our tool outperformed the two existing tools in detecting fusion genes, as demonstrated in long-read RNA-seq datasets. We also identified potentially novel fusion genes consistently detected across multiple tools or datasets.

Conclusions: The application of our tool to the detection of fusion genes in long-read RNA-seq datasets from two different cancer cell lines demonstrated the detection effectiveness of this tool.

Keywords: RNA sequencing; fusion gene; long-read sequencing.

MeSH terms

Cell Line, Tumor
Gene Expression Profiling / methods
Gene Fusion*
Humans
Neoplasms* / genetics
Oncogene Proteins, Fusion / genetics
Sequence Analysis, RNA / methods
Transcriptome* / genetics

Substances

Oncogene Proteins, Fusion

Grants and funding

JP21K19827 Japan/JSPS KAKENHI