Automated Inference of Chemical Discriminants of Biological Activity

Sebastian Raschka; Anne M Scott; Mar Huertas; Weiming Li; Leslie A Kuhn

doi:10.1007/978-1-4939-7756-7_16

Automated Inference of Chemical Discriminants of Biological Activity

Methods Mol Biol. 2018:1762:307-338. doi: 10.1007/978-1-4939-7756-7_16.

Authors

Sebastian Raschka¹, Anne M Scott², Mar Huertas^{2

3}, Weiming Li², Leslie A Kuhn^{4

5

6}

Affiliations

¹ Department of Biochemistry and Molecular Biology , Michigan State University, East Lansing, MI, USA.
² Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI, USA.
³ Department of Biology, Texas State University, San Marcos, TX, USA.
⁴ Department of Biochemistry and Molecular Biology , Michigan State University, East Lansing, MI, USA. [email protected].
⁵ Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI, USA. [email protected].
⁶ Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA. [email protected].

PMID: 29594779
DOI: 10.1007/978-1-4939-7756-7_16

Abstract

Ligand-based virtual screening has become a standard technique for the efficient discovery of bioactive small molecules. Following assays to determine the activity of compounds selected by virtual screening, or other approaches in which dozens to thousands of molecules have been tested, machine learning techniques make it straightforward to discover the patterns of chemical groups that correlate with the desired biological activity. Defining the chemical features that generate activity can be used to guide the selection of molecules for subsequent rounds of screening and assaying, as well as help design new, more active molecules for organic synthesis.The quantitative structure-activity relationship machine learning protocols we describe here, using decision trees, random forests, and sequential feature selection, take as input the chemical structure of a single, known active small molecule (e.g., an inhibitor, agonist, or substrate) for comparison with the structure of each tested molecule. Knowledge of the atomic structure of the protein target and its interactions with the active compound are not required. These protocols can be modified and applied to any data set that consists of a series of measured structural, chemical, or other features for each tested molecule, along with the experimentally measured value of the response variable you would like to predict or optimize for your project, for instance, inhibitory activity in a biological assay or ΔG_binding. To illustrate the use of different machine learning algorithms, we step through the analysis of a dataset of inhibitor candidates from virtual screening that were tested recently for their ability to inhibit GPCR-mediated signaling in a vertebrate.

Keywords: Fingerprint analysis; GPCR; Invasive species control; Ligand-based screening; Machine learning; Pharmacophore; Quantitative structure–activity relationship; Random forest; Virtual screening.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Animals
Computational Biology / methods*
Drug Evaluation, Preclinical
High-Throughput Screening Assays
Humans
Ligands
Machine Learning
Protein Binding
Quantitative Structure-Activity Relationship
Receptors, G-Protein-Coupled / chemistry*
Receptors, G-Protein-Coupled / metabolism*
Signal Transduction / drug effects
Small Molecule Libraries / chemistry*
Small Molecule Libraries / pharmacology
Vertebrates / metabolism

Substances

Ligands
Receptors, G-Protein-Coupled
Small Molecule Libraries