Benchmarking methods and data sets for ligand enrichment assessment in virtual screening

Methods. 2015 Jan:71:146-57. doi: 10.1016/j.ymeth.2014.11.015. Epub 2014 Dec 3.

Abstract

Retrospective small-scale virtual screening (VS) based on benchmarking data sets has been widely used to estimate ligand enrichments of VS approaches in the prospective (i.e. real-world) efforts. However, the intrinsic differences of benchmarking sets to the real screening chemical libraries can cause biased assessment. Herein, we summarize the history of benchmarking methods as well as data sets and highlight three main types of biases found in benchmarking sets, i.e. "analogue bias", "artificial enrichment" and "false negative". In addition, we introduce our recent algorithm to build maximum-unbiased benchmarking sets applicable to both ligand-based and structure-based VS approaches, and its implementations to three important human histone deacetylases (HDACs) isoforms, i.e. HDAC1, HDAC6 and HDAC8. The leave-one-out cross-validation (LOO CV) demonstrates that the benchmarking sets built by our algorithm are maximum-unbiased as measured by property matching, ROC curves and AUCs.

Keywords: Analogue bias; Artificial enrichment; Benchmarking methodology; Decoy sets; Ligand-based virtual screening; Structure-based virtual screening.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Area Under Curve
  • Benchmarking*
  • Drug Discovery / methods
  • Drug Evaluation, Preclinical / methods*
  • Ligands
  • ROC Curve

Substances

  • Ligands