ToxinPredictor: Computational models to predict the toxicity of molecules

Chemosphere. 2024 Dec 24:370:143900. doi: 10.1016/j.chemosphere.2024.143900. Online ahead of print.

Abstract

Predicting the toxicity of molecules is essential in fields like drug discovery, environmental protection, and industrial chemical management. While traditional experimental methods are time-consuming and costly, computational models offer an efficient alternative. In this study, we introduce ToxinPredictor, a machine learning-based model to predict the toxicity of small molecules using their structural properties. The model was trained on a curated dataset of 7550 toxic and 6514 non-toxic molecules, leveraging feature selection techniques like Boruta and PCA. The best-performing model, a Support Vector Machine (SVM), achieved state-of-the-art results with an AUROC of 91.7%, F1-score of 84.9%, and accuracy of 85.4%, outperforming existing solutions. SHAP analysis was applied to the SVM model to identify the most important molecular descriptors contributing to toxicity predictions, enhancing interpretability. Despite challenges related to data quality, ToxinPredictor provides a reliable framework for toxicity risk assessment, paving the way for safer drug development and improved environmental health assessments. We also created a user-friendly webserver, ToxinPredictor (https://cosylab.iiitd.edu.in/toxinpredictor) to facilitate the search and prediction of toxic compounds.

Keywords: Chemical safety; Deep learning; Drug discovery; Feature selection; Machine learning; Molecular toxicity prediction; Webserver.