Comparing Molecular Patterns Using the Example of SMARTS: Theory and Algorithms

J Chem Inf Model. 2019 Jun 24;59(6):2560-2571. doi: 10.1021/acs.jcim.9b00250. Epub 2019 May 23.

Abstract

Molecular patterns are widely used for compound filtering in molecular design endeavors. They describe structural properties that are connected with unwanted physical or chemical properties like reactivity or toxicity. With filter sets comprising hundreds of structural filters, an analytic approach to compare those patterns is needed. Here we present a novel approach to solve the generic pattern comparison problem. We introduce chemically inspired fingerprints for pattern nodes and edges to derive an easy-to-compare pattern representation. On two annotated pattern graphs we apply a maximum common subgraph algorithm enabling the calculation of pattern inclusion and similarity. The resulting algorithm can be used in many different ways. We can automatically derive pattern hierarchies or search in large pattern collections for more general or more specific patterns. To the best of our knowledge, the presented algorithm is the first of its kind enabling these types of chemical pattern analytics. Our new tool named SMARTScompare is an implementation of the approach for the SMARTS language, which is the quasi-standard for structural filters. We demonstrate the capabilities of SMARTScompare on a large collection of SMARTS patterns from real applications.

MeSH terms

  • Algorithms
  • Cheminformatics / methods
  • Pattern Recognition, Automated / methods
  • Small Molecule Libraries / chemistry*
  • Software*

Substances

  • Small Molecule Libraries