Sequencing on the SOLiD 5500xl System - in-depth characterization of the GC bias

Nucleus. 2017 Jul 4;8(4):370-380. doi: 10.1080/19491034.2017.1320461. Epub 2017 Apr 27.

Abstract

Different types of sequencing biases have been described and subsequently improved for a variety of sequencing systems, mostly focusing on the widely used Illumina systems. Similar studies are missing for the SOLiD 5500xl system, a sequencer which produced many data sets available to researchers today. Describing and understanding the bias is important to accurately interpret and integrate these published data in various ongoing research projects. We report a particularly strong GC bias for this sequencing system when analyzing a defined gDNA mix of 5 microbes with a wide range of different GC contents (20-72%) when comparing to the expected distribution and Illumina MiSeq data from the same DNA pool. Since we observed this bias already under PCR-free conditions, changing the PCR conditions during library preparation - a common strategy to handle bias in the Illumina system - was not relevant. Source of the bias appeared to be an uneven heat distribution during the SOLiD emulsion PCR (ePCR) - for enrichment of libraries prior loading - since ePCR in either small pouches or in 96-well plates improved the GC bias. Sequencing of chromatin immunoprecipitated DNA (ChIP-seq) is a common approach in epigenetics. ChIP-seq of the mixed source histone mark H3K9ac (acetyl Histone H3 lysine 9), typically found on promoter regions and on gene bodies, including CpG islands, performed on a SOLiD 5500xl machine, resulted in major loss of reads at GC rich loci (GC content ≥ 62%), not explained by low sequencing depth. This was improved with adaptations of the ePCR.

Keywords: CpG island; GC bias; H3K9ac; PCR-free library preparation; chromatin immunoprecipitation (ChIP); emulsion polymerase chain reaction (ePCR); microbial genomic DNA; next generation sequencing (NGS); sequencing depth; upscale PCR.

MeSH terms

  • Base Composition*
  • Sequence Analysis, DNA / standards
  • Sequence Analysis, DNA / trends*