In Silico Prediction of Chemicals Binding to Aromatase with Machine Learning Methods

Chem Res Toxicol. 2017 May 15;30(5):1209-1218. doi: 10.1021/acs.chemrestox.7b00037. Epub 2017 Apr 26.

Abstract

Environmental chemicals may affect endocrine systems through multiple mechanisms, one of which is via effects on aromatase (also known as CYP19A1), an enzyme critical for maintaining the normal balance of estrogens and androgens in the body. Therefore, rapid and efficient identification of aromatase-related endocrine disrupting chemicals (EDCs) is important for toxicology and environment risk assessment. In this study, on the basis of the Tox21 10K compound library, in silico classification models for predicting aromatase binders/nonbinders were constructed by machine learning methods. To improve the prediction ability of the models, a combined classifier (CC) strategy that combines different independent machine learning methods was adopted. Performances of the models were measured by test and external validation sets containing 1336 and 216 chemicals, respectively. The best model was obtained with the MACCS (Molecular Access System) fingerprint and CC method, which exhibited an accuracy of 0.84 for the test set and 0.91 for the external validation set. Additionally, several representative substructures for characterizing aromatase binders, such as ketone, lactone, and nitrogen-containing derivatives, were identified using information gain and substructure frequency analysis. Our study provided a systematic assessment of chemicals binding to aromatase. The built models can be helpful to rapidly identify potential EDCs targeting aromatase.

Publication types

  • Validation Study

MeSH terms

  • Aromatase / metabolism*
  • Computer Simulation
  • Machine Learning*
  • Models, Theoretical
  • Neural Networks, Computer
  • Support Vector Machine

Substances

  • Aromatase