New publicly available chemical query language, CSRML, to support chemotype representations for application to data mining and modeling

J Chem Inf Model. 2015 Mar 23;55(3):510-28. doi: 10.1021/ci500667v. Epub 2015 Feb 19.

Abstract

Chemotypes are a new approach for representing molecules, chemical substructures and patterns, reaction rules, and reactions. Chemotypes are capable of integrating types of information beyond what is possible using current representation methods (e.g., SMARTS patterns) or reaction transformations (e.g., SMIRKS, reaction SMILES). Chemotypes are expressed in the XML-based Chemical Subgraphs and Reactions Markup Language (CSRML), and can be encoded not only with connectivity and topology but also with properties of atoms, bonds, electronic systems, or molecules. CSRML has been developed in parallel with a public set of chemotypes, i.e., the ToxPrint chemotypes, which are designed to provide excellent coverage of environmental, regulatory, and commercial-use chemical space, as well as to represent chemical patterns and properties especially relevant to various toxicity concerns. A software application, ChemoTyper has also been developed and made publicly available in order to enable chemotype searching and fingerprinting against a target structure set. The public ChemoTyper houses the ToxPrint chemotype CSRML dictionary, as well as reference implementation so that the query specifications may be adopted by other chemical structure knowledge systems. The full specifications of the XML-based CSRML standard used to express chemotypes are publicly available to facilitate and encourage the exchange of structural knowledge.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Chemistry*
  • Data Mining*
  • Databases, Factual
  • Molecular Structure
  • Phosphoric Acids / chemistry
  • Programming Languages*
  • Software*
  • Structure-Activity Relationship
  • Toxicology / methods
  • User-Computer Interface

Substances

  • Phosphoric Acids
  • phosphoric acid