Biases in small RNA deep sequencing data

Nucleic Acids Res. 2014 Feb;42(3):1414-26. doi: 10.1093/nar/gkt1021. Epub 2013 Nov 5.

Abstract

High-throughput RNA sequencing (RNA-seq) is considered a powerful tool for novel gene discovery and fine-tuned transcriptional profiling. The digital nature of RNA-seq is also believed to simplify meta-analysis and to reduce background noise associated with hybridization-based approaches. The development of multiplex sequencing enables efficient and economic parallel analysis of gene expression. In addition, RNA-seq is of particular value when low RNA expression or modest changes between samples are monitored. However, recent data uncovered severe bias in the sequencing of small non-protein coding RNA (small RNA-seq or sRNA-seq), such that the expression levels of some RNAs appeared to be artificially enhanced and others diminished or even undetectable. The use of different adapters and barcodes during ligation as well as complex RNA structures and modifications drastically influence cDNA synthesis efficacies and exemplify sources of bias in deep sequencing. In addition, variable specific RNA G/C-content is associated with unequal polymerase chain reaction amplification efficiencies. Given the central importance of RNA-seq to molecular biology and personalized medicine, we review recent findings that challenge small non-protein coding RNA-seq data and suggest approaches and precautions to overcome or minimize bias.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Gene Expression Profiling / methods*
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Polymerase Chain Reaction
  • Precision Medicine
  • RNA, Messenger / chemistry
  • RNA, Messenger / metabolism
  • RNA, Small Untranslated / chemistry
  • RNA, Small Untranslated / metabolism*
  • Sequence Analysis, RNA / methods*

Substances

  • RNA, Messenger
  • RNA, Small Untranslated