Optimizing predictive performance of CASE Ultra expert system models using the applicability domains of individual toxicity alerts

J Chem Inf Model. 2012 Oct 22;52(10):2609-18. doi: 10.1021/ci300111r. Epub 2012 Sep 18.

Abstract

Fragment based expert system models of toxicological end points are primarily comprised of a set of substructures that are statistically related to the toxic property in question. These special substructures are often referred to as toxicity alerts, toxicophores, or biophores. They are the main building blocks/classifying units of the model, and it is important to define the chemical structural space within which the alerts are expected to produce reliable predictions. Furthermore, defining an appropriate applicability domain is required as part of the OECD guidelines for the validation of quantitative structure-activity relationships (QSARs). In this respect, this paper describes a method to construct applicability domains for individual toxicity alerts that are part of the CASE Ultra expert system models. Defining applicability domain for individual alerts was necessary because each CASE Ultra model is comprised of multiple alerts, and different alerts of a model usually represent different toxicity mechanisms and cover different structural space; the use of an applicability domain for the overall model is often not adequate. The domain for each alert was constructed using a set of fragments that were found to be statistically related to the end point in question as opposed to using overall structural similarity or physicochemical properties. Use of the applicability domains in reducing false positive predictions is demonstrated. It is now possible to obtain ROC (receiver operating characteristic) profiles of CASE Ultra models by applying domain adherence cutoffs on the alerts identified in test chemicals. This helps in optimizing the performance of a model based on their true positive-false positive prediction trade-offs and reduce drastic effects on the predictive performance caused by the active/inactive ratio of the model's training set. None of the major currently available commercial expert systems for toxicity prediction offer the possibility to explore a model's full range of sensitivity-specificity spectrum, and therefore, the methodology developed in this study can be of benefit in improving the predictive ability of the alert based expert systems.

MeSH terms

  • Animals
  • Aspergillus / drug effects
  • Aspergillus / genetics
  • Biological Products / chemistry*
  • Biological Products / toxicity*
  • Computer Simulation
  • Databases, Chemical
  • Drosophila melanogaster / drug effects
  • Drosophila melanogaster / genetics
  • Models, Molecular
  • Molecular Structure
  • Mutagens / chemistry*
  • Mutagens / toxicity*
  • Mutation
  • Neurospora crassa / drug effects
  • Neurospora crassa / genetics
  • Quantitative Structure-Activity Relationship*
  • ROC Curve
  • Saccharomyces cerevisiae / drug effects
  • Saccharomyces cerevisiae / genetics
  • Salmonella typhimurium / drug effects
  • Salmonella typhimurium / genetics

Substances

  • Biological Products
  • Mutagens