Large-scale DNA Barcode Library Generation for Biomolecule Identification in High-throughput Screens

Sci Rep. 2017 Oct 24;7(1):13899. doi: 10.1038/s41598-017-12825-2.

Abstract

High-throughput screens allow for the identification of specific biomolecules with characteristics of interest. In barcoded screens, DNA barcodes are linked to target biomolecules in a manner allowing for the target molecules making up a library to be identified by sequencing the DNA barcodes using Next Generation Sequencing. To be useful in experimental settings, the DNA barcodes in a library must satisfy certain constraints related to GC content, homopolymer length, Hamming distance, and blacklisted subsequences. Here we report a novel framework to quickly generate large-scale libraries of DNA barcodes for use in high-throughput screens. We show that our framework dramatically reduces the computation time required to generate large-scale DNA barcode libraries, compared with a naїve approach to DNA barcode library generation. As a proof of concept, we demonstrate that our framework is able to generate a library consisting of one million DNA barcodes for use in a fragment antibody phage display screening experiment. We also report generating a general purpose one billion DNA barcode library, the largest such library yet reported in literature. Our results demonstrate the value of our novel large-scale DNA barcode library generation framework for use in high-throughput screening applications.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA Barcoding, Taxonomic*
  • Gene Library*
  • High-Throughput Nucleotide Sequencing*
  • Sequence Analysis, DNA