On combining recursive partitioning and simulated annealing to detect groups of biologically active compounds

J Chem Inf Comput Sci. 2002 Mar-Apr;42(2):393-404. doi: 10.1021/ci0101049.

Abstract

Statistical data mining methods have proven to be powerful tools for investigating correlations between molecular structure and biological activity. Recursive partitioning (RP), in particular, offers several advantages in mining large, diverse data sets resulting from high throughput screening. When used with binary molecular descriptors, the standard implementation of RP splits on single descriptors. We use simulated annealing (SA) to find combinations of molecular descriptors whose simultaneous presence best separates off the most active, chemically similar group of compounds. The search is incorporated into a recursive partitioning design to produce a regression tree for biological activity on the space of structural fingerprints. Each node is characterized by a specific combination of structural features, and the terminal nodes with high average activities correspond, roughly, to different classes of compounds. Using LeadScope structural features as descriptors to mine a database from the National Cancer Institute, the merging of RP and SA consistently identifies structurally homogeneous classes of highly potent anticancer agents.

MeSH terms

  • Algorithms
  • Antineoplastic Agents / chemistry*
  • Antineoplastic Agents / pharmacology
  • Cell Line
  • Molecular Structure

Substances

  • Antineoplastic Agents