Identification of alternatively spliced gene isoforms and novel noncoding RNAs by single-molecule long-read sequencing in Camellia

RNA Biol. 2020 Jul;17(7):966-976. doi: 10.1080/15476286.2020.1738703. Epub 2020 Mar 19.

Abstract

Direct single-molecule sequencing of full-length transcripts allows efficient identification of gene isoforms, which is apt to alternative splicing (AS), polyadenylation, and long non-coding RNA analyses. However, the identification of gene isoforms and long non-coding RNAs with novel regulatory functions remains challenging, especially for species without a reference genome. Here, we present a comprehensive analysis of a combined long-read and short-read transcriptome sequencing in Camellia japonica. Through a novel bioinformatic pipeline of reverse-tracing the split-sites, we have uncovered 257,692 AS sites from 61,838 transcripts; and 13,068 AS isoforms have been validated by aligning the short reads. We have identified the tissue-specific AS isoforms along with 6,373 AS events that were found in all tissues. Furthermore, we have analysed the polyadenylation (polyA) patterns of transcripts, and found that the preference for polyA signals was different between the AS and non-AS transcripts. Moreover, we have predicted the phased small interfering RNA (phasiRNA) loci through integrative analyses of transcriptome and small RNA sequencing. We have shown that a newly evolved phasiRNA locus from lipoxygenases generated 12 consecutive 21 bp secondary RNAs, which were responsive to cold and heat stress in Camellia. Our studies of the isoform transcriptome provide insights into gene splicing and functions that may facilitate the mechanistic understanding of plants.

Keywords: Camellia; Alternative splicing; lipoxygenase; phased small interfering RNA; polyadenylation; single-molecule sequencing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alternative Splicing*
  • Camellia / genetics*
  • Computational Biology
  • Gene Expression Profiling
  • Gene Expression Regulation, Plant*
  • Genome, Plant
  • High-Throughput Nucleotide Sequencing*
  • Molecular Sequence Annotation
  • Phenotype
  • Polyadenylation
  • RNA Isoforms
  • RNA, Untranslated / genetics*
  • Single Molecule Imaging*
  • Transcriptome

Substances

  • RNA Isoforms
  • RNA, Untranslated

Grants and funding

This work was supported by Nonprofit Research Projects (CAFYBB2017SZ001) of Chinese Academy of Forestry, and National Science Foundation of China (NSFC Grant 31870578).