Gene Fusion Detection in Long-Read Transcriptome Datasets from Multiple Cancer Cell Lines

Front Biosci (Landmark Ed). 2024 Dec 11;29(12):413. doi: 10.31083/j.fbl2912413.

Abstract

Background: Fusion genes are important biomarkers in cancer research because their expression can produce abnormal proteins with oncogenic properties. Long-read RNA sequencing (long-read RNA-seq), which can sequence full-length mRNA transcripts, facilitates the detection of such fusion genes. Several tools have been proposed for detecting fusion genes in long-read RNA-seq datasets derived from cancer cells. However, the high sequencing error rate in long-read RNA-seq makes fusion gene detection challenging.

Methods: To address this issue, additional steps were incorporated into the fusion detection tool to improve detection accuracy. These steps include anchoring breakpoints to exon boundaries, realigning unaligned regions, and clustering breakpoints. To evaluate the accuracy of our tool in detecting fusion genes, we compared its detection accuracy with two representative existing tools, JAFFAL and FusionSeeker.

Results: Our tool outperformed the two existing tools in detecting fusion genes, as demonstrated in long-read RNA-seq datasets. We also identified potentially novel fusion genes consistently detected across multiple tools or datasets.

Conclusions: The application of our tool to the detection of fusion genes in long-read RNA-seq datasets from two different cancer cell lines demonstrated the detection effectiveness of this tool.

Keywords: RNA sequencing; fusion gene; long-read sequencing.

MeSH terms

  • Cell Line, Tumor
  • Gene Expression Profiling / methods
  • Gene Fusion*
  • Humans
  • Neoplasms* / genetics
  • Oncogene Proteins, Fusion / genetics
  • Sequence Analysis, RNA / methods
  • Transcriptome* / genetics

Substances

  • Oncogene Proteins, Fusion