Ensemble Geometric Deep Learning of Aqueous Solubility

J Chem Inf Model. 2023 Dec 11;63(23):7338-7349. doi: 10.1021/acs.jcim.3c01536. Epub 2023 Nov 21.

Abstract

Geometric deep learning is one of the main workhorses for harnessing the power of big data to predict molecular properties such as aqueous solubility, which is key to the pharmacokinetic improvement of drug candidates. Two ensembles of graph neural network architectures were built, one based on spectral convolution and the other on spatial convolution. The pretrained models, denoted respectively as SolNet-GCN and SolNet-GAT, significantly outperformed the existing neural networks benchmarked on a validation set of 207 molecules. The SolNet-GCN model demonstrated the best performance on both the training and validation sets, with RMSE values of 0.53 and 0.72 log molar unit and Pearson r2 values of 0.95 and 0.75, respectively. Further, the ranking power of the SolNet models agreed well with a QM-based thermodynamic cycle approach at the PBE-vdW level of theory on a series of benzophenylurea derivatives and a series of benzodiazepine derivatives. Nevertheless, testing the resultant models on a set of inhibitors of the macrophage migration inhibitory factor (MIF) illustrated that the inclusion of atomic attributes to discriminate atoms with a higher tendency to form intermolecular hydrogen bonds in the crystalline state and to identify planar or nonplanar substructures can be beneficial for the prediction of aqueous solubility.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Extramural

MeSH terms

  • Deep Learning*
  • Neural Networks, Computer
  • Solubility
  • Thermodynamics
  • Water / chemistry

Substances

  • Water