Conformational sampling for large-scale virtual screening: accuracy versus ensemble size

J Chem Inf Model. 2009 Oct;49(10):2303-11. doi: 10.1021/ci9002415.

Abstract

We introduce the TrixX Conformer Generator (TCG), a novel tool for generating conformational ensembles. The tool addresses especially the requirements of large-scale computer-aided drug design applications using conformer databases. For these, the trade-off between accuracy, i.e. rmsd to biologically active conformers, and database size, i.e. the number of conformers in an ensemble, is of central interest. Based on a tree data structure representing the molecule, conformations are generated incrementally in a best-first-search build-up process employing an internal rmsd clustering. This way TCG builds conformational ensembles of low energy conformers utilizing conformational energy as a scoring function. A crucial parameter is the amount of search space to be covered in the build-up process. This parameter is determined according to an exponential function employing a user-specified quality level as base and an exponent which depends on the molecule's flexibility. The quality level allows the user to set the aforementioned trade-off while taking into account the exponentially growing number of combinations of torsion angles. Tested on a set of 778 molecules, we show that on average 20 conformers per ensemble suffice to achieve an average accuracy of 1.13 A. We observed that an improvement in accuracy goes along with an exponential rise of the number of conformations per ensemble (e.g., 100 conformations per ensemble yield an accuracy of 0.99 A). Furthermore, we show that for molecules with less than nine rotatable bonds, ensembles with an average accuracy better than 1 A can be generated with an average ensemble size of 20 conformers. However, this value deteriorates for more flexible molecules. A comparison to CATALYST and OMEGA shows that TCG achieves a comparable performance in terms of accuracy. Furthermore, it performs well with respect to the trade-off between accuracy and ensemble size.

MeSH terms

  • Algorithms
  • Databases, Factual
  • Drug Evaluation, Preclinical / methods*
  • Models, Molecular
  • Molecular Conformation*
  • Time Factors
  • User-Computer Interface*