Understanding Data Noise and Uncertainty through Analysis of Replicate Samples in DNA-Encoded Library Selection

J Chem Inf Model. 2022 May 9;62(9):2239-2247. doi: 10.1021/acs.jcim.1c00986. Epub 2021 Dec 4.

Abstract

By analyzing data sets of replicate DNA-Encoded Library (DEL) selections, an approach for estimating the noise level of the experiment has been developed. Using a logarithm transformation of the number of counts associated with each compound and a subset of compounds with the highest number of counts, it is possible to assess the quality of the data through normalizing the replicates and use this same data to estimate the noise in the experiment. The noise level is seen to be dependent on sequencing depth as well as specific selection conditions. The noise estimation is independent of any cutoff used to remove low frequency compounds from the data analysis. The removal of compounds with only 1-5 read counts greatly reduces some of the challenges encountered in DEL data analysis as it can reduce the data set by greater than 100-fold without impacting the interpretation of the results.

MeSH terms

  • DNA*
  • Data Analysis
  • Small Molecule Libraries*
  • Uncertainty

Substances

  • Small Molecule Libraries
  • DNA