DNA Encoded Libraries (DELs) use unique DNA sequences to tag each chemical warhead within a library mixture to enable deconvolution following affinity selection against a target protein. With next-generation sequencing, millions to billions of sequences can be read and counted to report binding events. This unprecedented capability has enabled researchers to synthesize and analyze numerically large chemical libraries. Despite the common perception that each library member undergoes a miniaturized affinity assay, selections with higher complexity libraries often produce results that are difficult to rank order. In this study, we aimed to understand the robustness of DEL selection by examining the sequencing readouts of warheads and chemotype families among a large number of experimentally repeated selections. The results revealed that (1) the output of DEL selection is intrinsically noisy but can be reliably modeled by the Poisson distribution, and (2) Poisson noise is the dominating noise at low copy counts and can be estimated even from a single experiment. We also discuss the shortcomings of data analyses based on directly using copy counts and their linear transformations, and propose a framework that incorporates proper normalization and confidence interval calculation to help researchers better understand DEL data.
Keywords: DEL; DNA Encoded Library; Poisson; data analysis; normalization.