Novel naïve Bayes classification models for predicting the carcinogenicity of chemicals

Food Chem Toxicol. 2016 Nov:97:141-149. doi: 10.1016/j.fct.2016.09.005. Epub 2016 Sep 3.

Abstract

The carcinogenicity prediction has become a significant issue for the pharmaceutical industry. The purpose of this investigation was to develop a novel prediction model of carcinogenicity of chemicals by using a naïve Bayes classifier. The established model was validated by the internal 5-fold cross validation and external test set. The naïve Bayes classifier gave an average overall prediction accuracy of 90 ± 0.8% for the training set and 68 ± 1.9% for the external test set. Moreover, five simple molecular descriptors (e.g., AlogP, Molecular weight (MW), No. of H donors, Apol and Wiener) considered as important for the carcinogenicity of chemicals were identified, and some substructures related to the carcinogenicity were achieved. Thus, we hope the established naïve Bayes prediction model could be applied to filter early-stage molecules for this potential carcinogenicity adverse effect; and the identified five simple molecular descriptors and substructures of carcinogens would give a better understanding of the carcinogenicity of chemicals, and further provide guidance for medicinal chemists in the design of new candidate drugs and lead optimization, ultimately reducing the attrition rate in later stages of drug development.

Keywords: Carcinogenicity; Extended connectivity fingerprints (ECFP_14); In silico prediction; Molecular descriptors; Naïve Bayes classifier.

MeSH terms

  • Animals
  • Bayes Theorem*
  • Carcinogenicity Tests / methods*
  • Carcinogens / chemistry
  • Carcinogens / classification*
  • Carcinogens / toxicity*
  • Computer Simulation
  • Databases, Chemical
  • Models, Statistical*
  • Neoplasms / chemically induced*
  • Rats

Substances

  • Carcinogens