Predicting long non-coding RNAs using RNA sequencing

Methods. 2013 Sep 1;63(1):50-9. doi: 10.1016/j.ymeth.2013.03.019. Epub 2013 Mar 27.

Abstract

The advent of next-generation sequencing, and in particular RNA-sequencing (RNA-seq), technologies has expanded our knowledge of the transcriptional capacity of human and other animal, genomes. In particular, recent RNA-seq studies have revealed that transcription is widespread across the mammalian genome, resulting in a large increase in the number of putative transcripts from both within, and intervening between, known protein-coding genes. Long transcripts that appear to lack protein-coding potential (long non-coding RNAs, lncRNAs) have been the focus of much recent research, in part owing to observations of their cell-type and developmental time-point restricted expression patterns. A variety of sequencing protocols are currently available for identifying lncRNAs including RNA polymerase II occupancy, chromatin state maps and - the focus of this review - deep RNA sequencing. In addition, there are numerous analytical methods available for mapping reads and assembling transcript models that predict the presence and structure of lncRNAs from RNA-seq data. Here we review current methods for identifying lncRNAs using large-scale sequencing data from RNA-seq experiments and highlight analytical considerations that are required when undertaking such projects.

Keywords: Long non-coding RNA; Next generation sequencing; RNA-seq; lncRNAs.

Publication types

  • Review

MeSH terms

  • Base Sequence
  • Chromatin / genetics
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • RNA Polymerase II / genetics
  • RNA, Long Noncoding / genetics
  • RNA, Long Noncoding / isolation & purification*
  • Transcription, Genetic*

Substances

  • Chromatin
  • RNA, Long Noncoding
  • RNA Polymerase II