The Long Read Transcriptome of Rice (Oryza sativa ssp. japonica var. Nipponbare) Reveals Novel Transcripts

Sharmin Hasan; Lichun Huang; Qiaoquan Liu; Virginie Perlo; Angela O'Keeffe; Gabriel Rodrigues Alves Margarido; Agnelo Furtado; Robert J Henry

doi:10.1186/s12284-022-00577-1

The Long Read Transcriptome of Rice (Oryza sativa ssp. japonica var. Nipponbare) Reveals Novel Transcripts

Rice (N Y). 2022 Jun 11;15(1):29. doi: 10.1186/s12284-022-00577-1.

Authors

Sharmin Hasan^{1

2}, Lichun Huang³, Qiaoquan Liu³, Virginie Perlo¹, Angela O'Keeffe¹, Gabriel Rodrigues Alves Margarido⁴, Agnelo Furtado¹, Robert J Henry^{5

6}

Affiliations

¹ Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, 4072, Australia.
² Department of Botany, Jagannath University, Dhaka, 1100, Bangladesh.
³ College of Agriculture, Yangzhou University, Jiangsu, 225009, China.
⁴ Departamento de Genética, Escola Superior de Agricultura "Luiz de Queiroz", Universidade de São Paulo, Piracicaba, São Paulo, 13418-900, Brazil.
⁵ Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, 4072, Australia. [email protected].
⁶ ARC Centre of Excellence for Plant Success in Nature and Agriculture, University of Queensland, Brisbane, 4072, Australia. [email protected].

Abstract

Background: High-throughput next-generation sequencing technologies offer a powerful approach to characterizing the transcriptomes of plants. Long read sequencing has been shown to support the discovery of novel isoforms of transcripts. This approach enables the generation of full-length sequences revealing splice variants that may be important in regulating gene action. Investigation of the diversity of transcripts in the rice transcriptome including splice variants was conducted using PacBio long-read sequence data to improve the annotation of the rice genome.

Results: A cDNA library was prepared from RNA extracted from leaves, roots, seeds, inflorescences, and panicles of O. sativa ssp. japonica var Nipponbare and sequenced on a PacBio Sequel platform. This produced 346,190 non-redundant full-length non-chimeric reads (FLNC) resulting in 33,504 high-quality transcripts. Half of the transcripts were multi-exonic and entirely matched with the reference transcripts. However, 14,874 novel isoforms were also identified resulting predominantly from intron retention and at least one novel splice site. Intron retention was the prevalent alternative splicing event and exon skipping was the least observed. Of 73,659 splice junctions, 12,755 (17%) represented novel splice junctions with canonical and non-canonical intron boundaries. The complexity of the transcriptome was examined in detail for 19 starch synthesis-related genes, defining 276 spliced isoforms of which 94 splice variants were novel.

Conclusion: The data reveal the great complexity of the rice transcriptome. The novel transcripts provide new insights that may be a key input in future research to improve the annotation of the rice genome.

Keywords: Alternative splicing isoforms; Full-length transcripts; Iso-sequencing; Novel isoforms; Rice transcriptome; Splicing junctions.