Bio-QSARs 2.0: Unlocking a new level of predictive power for machine learning-based ecotoxicity predictions by exploiting chemical and biological information

Environ Int. 2024 Apr:186:108607. doi: 10.1016/j.envint.2024.108607. Epub 2024 Apr 4.

Abstract

Practical, legal, and ethical reasons necessitate the development of methods to replace animal experiments. Computational techniques to acquire information that traditionally relied on animal testing are considered a crucial pillar among these so-called new approach methodologies. In this light, we recently introduced the Bio-QSAR concept for multispecies aquatic toxicity regression tasks. These machine learning models, trained on both chemical and biological information, are capable of both cross-chemical and cross-species predictions. Here, we significantly extend these models' applicability. This was realized by increasing the quantity of training data by a factor of approximately 20, accomplished by considering both additional chemicals and aquatic organisms. Additionally, variable test durations and associated random effects were accommodated by employing a machine learning algorithm that combines tree-boosting with mixed-effects modeling (i.e., Gaussian Process Boosting). We also explored various biological descriptors including Dynamic Energy Budget model parameters, taxonomic distances, as well as genus-specific traits and investigated the inclusion of mode-of-action information. Through these efforts, we developed Bio-QSARs for fish and aquatic invertebrates with exceptional predictive power (R squared of up to 0.92 on independent test sets). Moreover, we made considerable strides to make models applicable for a range of use cases in environmental risk assessment as well as research and development of chemicals. Models were made fully explainable by implementing an algorithmic multicollinearity correction combined with SHapley Additive exPlanations. Furthermore, we devised novel approaches for applicability domain construction that take feature importance into account. We are hence confident these models, which are available via open access, will make a significant contribution towards the implementation of new approach methodologies and ultimately have the potential to support "Green Chemistry" and "Green Toxicology".

Keywords: 3R Principle; Computational ecotoxicology; Explainable artificial intelligence; New approach methodology (NAM); Quantitative structure-activity relationship (QSAR); Species trait.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Aquatic Organisms / drug effects
  • Ecotoxicology / methods
  • Fishes*
  • Invertebrates / drug effects
  • Machine Learning*
  • Quantitative Structure-Activity Relationship*
  • Water Pollutants, Chemical / analysis
  • Water Pollutants, Chemical / toxicity

Substances

  • Water Pollutants, Chemical