Annotation of Full-Length Long Noncoding RNAs with Capture Long-Read Sequencing (CLS)

Methods Mol Biol. 2021:2254:133-159. doi: 10.1007/978-1-0716-1158-6_9.

Abstract

Metazoan genomes produce thousands of long-noncoding RNAs (lncRNAs), of which just a small fraction have been well characterized. Understanding their biological functions requires accurate annotations, or maps of the precise location and structure of genes and transcripts in the genome. Current lncRNA annotations are limited by compromises between quality and size, with many gene models being fragmentary or uncatalogued. To overcome this, the GENCODE consortium has developed RNA capture long-read sequencing (CLS), an approach combining targeted RNA capture with third-generation long-read sequencing. CLS provides accurate annotations at high-throughput rates. It eliminates the need for noisy transcriptome assembly from short reads, and requires minimal manual curation. The full-length transcript models produced are of quality comparable to present-day manually curated annotations. Here we describe a detailed CLS protocol, from probe design through long-read sequencing to creation of final annotations.

Keywords: CaptureSeq; GENCODE; Genome annotation; Long-read RNA sequencing; NGS; Nanopore; Next-generation sequencing; PacBio; Targeted RNA sequencing; lncRNAs.

MeSH terms

  • Animals
  • Computational Biology / methods
  • Data Curation
  • High-Throughput Nucleotide Sequencing / methods*
  • Molecular Sequence Annotation / methods*
  • RNA, Long Noncoding / genetics*
  • Sequence Analysis, RNA

Substances

  • RNA, Long Noncoding