A long-read RNA-seq approach to identify novel transcripts of very large genes

Prech Uapinyoying; Jeremy Goecks; Susan M Knoblach; Karuna Panchapakesan; Carsten G Bonnemann; Terence A Partridge; Jyoti K Jaiswal; Eric P Hoffman

doi:10.1101/gr.259903.119

A long-read RNA-seq approach to identify novel transcripts of very large genes

Genome Res. 2020 Jun;30(6):885-897. doi: 10.1101/gr.259903.119. Epub 2020 Jul 6.

Authors

Prech Uapinyoying^{1

2

3}, Jeremy Goecks⁴, Susan M Knoblach^{1

2}, Karuna Panchapakesan¹, Carsten G Bonnemann^{1

3}, Terence A Partridge^{1

2}, Jyoti K Jaiswal^{1

2}, Eric P Hoffman^{1

5}

Affiliations

¹ Center for Genetic Medicine Research, Children's Research Institute, Children's National Health System, Washington, D.C. 20010, USA.
² Department of Genomics and Precision Medicine, The George Washington University School of Medicine and Health Sciences, Washington, D.C. 20052, USA.
³ Neuromuscular and Neurogenetic Disorders of Childhood Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland 20892, USA.
⁴ Computational Biology Program, Oregon Health and Science University, Portland, Oregon 97239, USA.
⁵ Department of Pharmaceutical Sciences, School of Pharmacy and Pharmaceutical Sciences, Binghamton University, Binghamton, New York 13902, USA.

Abstract

RNA-seq is widely used for studying gene expression, but commonly used sequencing platforms produce short reads that only span up to two exon junctions per read. This makes it difficult to accurately determine the composition and phasing of exons within transcripts. Although long-read sequencing improves this issue, it is not amenable to precise quantitation, which limits its utility for differential expression studies. We used long-read isoform sequencing combined with a novel analysis approach to compare alternative splicing of large, repetitive structural genes in muscles. Analysis of muscle structural genes that produce medium (Nrap: 5 kb), large (Neb: 22 kb), and very large (Ttn: 106 kb) transcripts in cardiac muscle, and fast and slow skeletal muscles identified unannotated exons for each of these ubiquitous muscle genes. This also identified differential exon usage and phasing for these genes between the different muscle types. By mapping the in-phase transcript structures to known annotations, we also identified and quantified previously unannotated transcripts. Results were confirmed by endpoint PCR and Sanger sequencing, which revealed muscle-type-specific differential expression of these novel transcripts. The improved transcript identification and quantification shown by our approach removes previous impediments to studies aimed at quantitative differential expression of ultralong transcripts.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Alternative Splicing
Computational Biology / methods
Exons
Gene Expression Profiling* / methods
High-Throughput Nucleotide Sequencing*
Humans
Molecular Sequence Annotation
Organ Specificity / genetics
RNA, Messenger*
Repetitive Sequences, Nucleic Acid
Sequence Analysis, RNA*
Transcriptome*

Substances

RNA, Messenger

Abstract

Publication types

MeSH terms

Substances

Grants and funding