Comparison of methods based on diversity and similarity for molecule selection and the analysis of drug discovery data

Raymond L H Lam; William J Welch

doi:10.1385/1-59259-802-1:301

Comparison of methods based on diversity and similarity for molecule selection and the analysis of drug discovery data

Methods Mol Biol. 2004:275:301-16. doi: 10.1385/1-59259-802-1:301.

Authors

Raymond L H Lam¹, William J Welch

Affiliation

¹ Department of Data Exploration Sciences, GlaxoSmithKline, King of Prussia, Pennsylvania, USA.

PMID: 15141118
DOI: 10.1385/1-59259-802-1:301

Abstract

The concepts of diversity and similarity of molecules are widely used in quantitative methods for designing (selecting) a representative set of molecules and for analyzing the relationship between chemical structure and biological activity. We review methods and algorithms for design of a diverse set of molecules in the chemical space using clustering, cell-based partitioning, or other distance-based approaches. Analogous cell-based and clustering methods are described for analyzing drug-discovery data to predict activity in virtual screening. Some performance comparisons are made. The choice of descriptor variables to characterize chemical structure is also included in the comparative study. We find that the diversity of a selected set is quite sensitive to both the statistical selection method and the choice of molecular descriptors and that, for the dataset used in this study, random selection works surprisingly well in providing a set of data for analysis.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Drug Design*
Structure-Activity Relationship