On combining recursive partitioning and simulated annealing to detect groups of biologically active compounds

Paul Blower; Michael Fligner; Joseph Verducci; Jeffrey Bjoraker

doi:10.1021/ci0101049

On combining recursive partitioning and simulated annealing to detect groups of biologically active compounds

J Chem Inf Comput Sci. 2002 Mar-Apr;42(2):393-404. doi: 10.1021/ci0101049.

Authors

Paul Blower¹, Michael Fligner, Joseph Verducci, Jeffrey Bjoraker

Affiliation

¹ Leadscope, Inc., 1245 Kinnear Road, Columbus, Ohio 43212, USA. [email protected]

PMID: 11911709
DOI: 10.1021/ci0101049

Abstract

Statistical data mining methods have proven to be powerful tools for investigating correlations between molecular structure and biological activity. Recursive partitioning (RP), in particular, offers several advantages in mining large, diverse data sets resulting from high throughput screening. When used with binary molecular descriptors, the standard implementation of RP splits on single descriptors. We use simulated annealing (SA) to find combinations of molecular descriptors whose simultaneous presence best separates off the most active, chemically similar group of compounds. The search is incorporated into a recursive partitioning design to produce a regression tree for biological activity on the space of structural fingerprints. Each node is characterized by a specific combination of structural features, and the terminal nodes with high average activities correspond, roughly, to different classes of compounds. Using LeadScope structural features as descriptors to mine a database from the National Cancer Institute, the merging of RP and SA consistently identifies structurally homogeneous classes of highly potent anticancer agents.

MeSH terms

Algorithms
Antineoplastic Agents / chemistry*
Antineoplastic Agents / pharmacology
Cell Line
Molecular Structure

Substances

Antineoplastic Agents