Combining machine learning models of in vitro and in vivo bioassays improves rat carcinogenicity prediction

Regul Toxicol Pharmacol. 2018 Apr:94:8-15. doi: 10.1016/j.yrtph.2018.01.008. Epub 2018 Jan 11.

Abstract

In vitro genotoxicity bioassays are cost-efficient methods of assessing potential carcinogens. However, many genotoxicity bioassays are inappropriate for detecting chemicals eliciting non-genotoxic mechanisms, such as tumour promotion, this necessitates the use of in vivo rodent carcinogenicity (IVRC) assays. In silico IVRC modelling could potentially address the low throughput and high cost of this assay. We aimed to develop and combine computational QSAR models of novel bioassays for the prediction of IVRC results and compare with existing software. QSAR models were generated from existing Ames (n = 6512), Syrian Hamster Embryonic (SHE, n = 410), ISSCAN rodent carcinogenicity (ISC, n = 834) and GreenScreen GADD45a-GFP (n = 1415) chemical datasets. These models mapped the molecular descriptors of each compound to their respective assay result using machine learning algorithms (adaboost, k-Nearest Neighbours, C.45 Decision Tree, Multilayer Perceptron, Random Forest). The best performing models were combined with k-Nearest Neighbours to create a cascade model for IVRC prediction. High QSAR model performance was observed from ten time 10-fold cross-validation with above 80% accuracy and 0.85 AUC for each assay dataset. The cascade model predicted rat carcinogenicity with 69.3% accuracy and 0.700 AUC. This study demonstrates the novelty of a combined approach for IVRC prediction, with higher performance than existing software.

Keywords: Assay; Cancer; Carcinogenicity; Cheminformatics; Machine learning; Prediction; QSAR.

MeSH terms

  • Animals
  • Biological Assay
  • Carcinogenicity Tests
  • Carcinogens / toxicity*
  • Computer Simulation
  • Machine Learning*
  • Models, Biological*
  • Rats

Substances

  • Carcinogens