Fine mapping of RNA isoform diversity using an innovative targeted long-read RNA sequencing protocol with novel dedicated bioinformatics pipeline

BMC Genomics. 2024 Sep 30;25(1):909. doi: 10.1186/s12864-024-10741-0.

Abstract

Background: Solving the structure of mRNA transcripts is a major challenge for both research and molecular diagnostic purposes. Current approaches based on short-read RNA sequencing and RT-PCR techniques cannot fully explore the complexity of transcript structure. The emergence of third-generation long-read sequencing addresses this problem by solving this sequence directly. However, genes with low expression levels are difficult to study with the whole transcriptome sequencing approach. To fix this technical limitation, we propose a novel method to capture transcripts of a gene panel using a targeted enrichment approach suitable for Pacific Biosciences and Oxford Nanopore Technologies platforms.

Results: We designed a set of probes to capture transcripts of a panel of genes involved in hereditary breast and ovarian cancer syndrome. We present SOSTAR (iSofOrmS annoTAtoR), a versatile pipeline to assemble, quantify and annotate isoforms from long read sequencing using a new tool specially designed for this application. The significant enrichment of transcripts by our capture protocol, together with the SOSTAR annotation, allowed the identification of 1,231 unique transcripts within the gene panel from the eight patients sequenced. The structure of these transcripts was annotated with a resolution of one base relative to a reference transcript. All major alternative splicing events of the BRCA1 and BRCA2 genes described in the literature were found. Complex splicing events such as pseudoexons were correctly annotated. SOSTAR enabled the identification of abnormal transcripts in the positive controls. In addition, a case of unexplained inheritance in a family with a history of breast and ovarian cancer was solved by identifying an SVA retrotransposon in intron 13 of the BRCA1 gene.

Conclusions: We have validated a new protocol for the enrichment of transcripts of interest using probes adapted to the ONT and PacBio platforms. This protocol allows a complete description of the alternative structures of transcripts, the estimation of their expression and the identification of aberrant transcripts in a single experiment. This proof-of-concept opens new possibilities for RNA structure exploration in both research and molecular diagnostics.

Keywords: Automatic annotation; HBOC; Isoform assembly; Long read sequencing; RNA splicing.

MeSH terms

  • Alternative Splicing
  • BRCA1 Protein / genetics
  • BRCA2 Protein / genetics
  • Computational Biology* / methods
  • Female
  • Hereditary Breast and Ovarian Cancer Syndrome / genetics
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • RNA Isoforms* / genetics
  • Sequence Analysis, RNA* / methods

Substances

  • RNA Isoforms
  • BRCA2 Protein
  • BRCA1 Protein