The impact of RNA secondary structure on read start locations on the Illumina sequencing platform

PLoS One. 2017 Feb 28;12(2):e0173023. doi: 10.1371/journal.pone.0173023. eCollection 2017.

Abstract

High-throughput sequencing is subject to sequence dependent bias, which must be accounted for if researchers are to make precise measurements and draw accurate conclusions from their data. A widely studied source of bias in sequencing is the GC content bias, in which levels of GC content in a genomic region effect the number of reads produced during sequencing. Although some research has been performed on methods to correct for GC bias, there has been little effort to understand the underlying mechanism. The availability of sequencing protocols that target the specific location of structure in nucleic acid molecules enables us to investigate the underlying molecular origin of observed GC bias in sequencing. By applying a parallel analysis of RNA structure (PARS) protocol to bacterial genomes of varying GC content, we are able to observe the relationship between local RNA secondary structure and sequencing outcome, and to establish RNA secondary structure as the significant contributing factor to observed GC bias.

MeSH terms

  • Base Composition / genetics
  • Genome, Bacterial / genetics
  • Genomics
  • High-Throughput Nucleotide Sequencing
  • Protein Structure, Secondary
  • RNA / chemistry*
  • RNA / genetics
  • Sequence Analysis, DNA

Substances

  • RNA