Fusion genes are well-known cancer drivers. However, most known oncogenic fusions are protein-coding, and very few involve non-coding sequences due to lack of suitable detection tools. We develop SFyNCS to detect fusions of both protein-coding genes and non-coding sequences from transcriptomic sequencing data. The main advantage of this study is that we use somatic structural variations detected from genomic data to validate fusions detected from transcriptomic data. This allows us to comprehensively evaluate various fusion detection and filtering strategies and parameters. We show that SFyNCS has superior sensitivity and specificity over existing algorithms through extensive benchmarking in cancer cell lines and patient samples. We then apply SFyNCS to 9565 tumor samples across 33 tumor types in The Cancer Genome Atlas cohort and detect a total of 165,139 fusions. Among them, 72% of the fusions involve non-coding sequences. We find a long non-coding RNA to recurrently fuse with various oncogenes in 3% of prostate cancers. In addition, we discover fusions involving two non-coding RNAs in 32% of dedifferentiated liposarcomas and experimentally validated the oncogenic functions in mouse model.
© The Author(s) 2023. Published by Oxford University Press on behalf of Nucleic Acids Research.