Prediction of aqueous solubility for a diverse set of organic compounds based on atom-type electrotopological state indices

Eur J Med Chem. 2000 Dec;35(12):1081-8. doi: 10.1016/s0223-5234(00)01186-7.

Abstract

We describe robust methods for estimating the aqueous solubility of a set of 734 organic compounds from different structural classes based on multiple linear regression (MLR) and artificial neural networks (ANN) model. The structures were represented by atom-type electrotopological state (E-state) indices. The squared correlation coefficient and standard deviation for the MLR with 34 structural parameters were r(2) = 0.94 and s = 0.58 for the training set of 675 compounds. For the test set of 21 compounds, the equivalent statistics were r(2)(pred) = 0.80 and s = 0.87, respectively. Neural networks gave a significant improvement using the same set of parameters, and the standard deviations were s = 0.52 for the training set and s = 0.75 for the test set when an artificial neural network with five neurons in the hidden layer was used. The results clearly show that accurate models can be rapidly calculated for the estimation of aqueous solubility for a large and diverse set of organic compounds using easily calculated structural parameters.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Linear Models
  • Neural Networks, Computer
  • Organic Chemicals / chemistry*
  • Solubility
  • Water / chemistry

Substances

  • Organic Chemicals
  • Water