Genotoxicity is a critical determinant for assessing the safety of pharmaceutical drugs, their metabolites, and impurities. Among genotoxicity tests, mechanistic assays such as the MultiFlow® DNA damage assay (MFA) allows the investigations on mode of action (MoA) of DNA damage through four mechanistic markers recorded at two time points. Previous studies have shown that machine learning (ML) can enhance precision on classifying the MoA of genotoxicants. Nevertheless, these approaches need to be tailored to specific chemical spaces and lab conditions for accurate risk assessment. In this study, we applied various state-of-the-art ML algorithms available in an open-source R package (caret) to build MFA-ML models using data from Bryce et al. (2016). The best model achieved 95% accuracy on the training dataset and correctly predicted genotoxicity in 16 out of 17 cases in the test dataset. Incorporating molecular descriptors properties from established in silico models demonstrated further improved performance of the approach to cover challenging examples of pharmaceuticals exhibiting a pharmacological mode of action that could interfere with the biomarker response. Further model validation on an external test set with 49 non-overlapped compounds showed a high model accuracy at 92%. Additionally, a tailored graphical user interface was developed using a freely available R package (shiny) to support visual analysis of MFA data including MoA predictions, facilitating broad usage by laboratory scientists. Lastly, a perspective on the integration of MoA predictions as additional evidence into a genotoxicity assessment workflow is proposed.
Keywords: DNA damage biomarker; genotoxicity; graphical user interface; machine learning; model deployment; model development; visualization.
© 2024 Environmental Mutagenesis and Genomics Society.