Some drugs and xenobiotics have the potential to disturb homeostasis, normal growth, differentiation, development or behavior during prenatal development or postnatally until puberty. Assessment of the developmental toxicity is one of the important safety considerations incorporated by international regulatory agencies. In this investigation, seven machine learning methods, including naïve Bayes, support vector machine, recursive partitioning, k-nearest neighbor, C4.5 decision tree, random forest and Adaboost, were used to build binary classification models for developmental toxicity. Among these models, the naïve Bayes classifier represented the best predictive performance and stability, which gave 91.11% overall prediction accuracy, 91.50% balanced accuracy and 0.818 MCC for the training set, and generated 83.93% concordance, 81.85% balanced accuracy and 0.627 MCC for the test set. The application domains were analyzed, and only one chemical in the test set was identified as outside the application domain. In addition, 10 important molecular descriptors related to developmental toxicity were selected by the genetic algorithm, which may contribute to explanation of the mechanisms of developmental toxicants. The best naïve Bayes classification model should be employed as alternative method for qualitative prediction of chemical-induced developmental toxicity in early stages of drug development.
Keywords: Developmental toxicity; Genetic algorithm; In silico prediction; Machine learning; Molecular descriptor.