Improving scoring-docking-screening powers of protein-ligand scoring functions using random forest

Cheng Wang; Yingkai Zhang

doi:10.1002/jcc.24667

Improving scoring-docking-screening powers of protein-ligand scoring functions using random forest

J Comput Chem. 2017 Jan 30;38(3):169-177. doi: 10.1002/jcc.24667. Epub 2016 Nov 17.

Authors

Cheng Wang¹, Yingkai Zhang^{1

2}

Affiliations

¹ Department of Chemistry, New York University, New York, New York, 10003.
² NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, 200062, China.

Abstract

The development of new protein-ligand scoring functions using machine learning algorithms, such as random forest, has been of significant interest. By efficiently utilizing expanded feature sets and a large set of experimental data, random forest based scoring functions (RFbScore) can achieve better correlations to experimental protein-ligand binding data with known crystal structures; however, more extensive tests indicate that such enhancement in scoring power comes with significant under-performance in docking and screening power tests compared to traditional scoring functions. In this work, to improve scoring-docking-screening powers of protein-ligand docking functions simultaneously, we have introduced a Δ_vina RF parameterization and feature selection framework based on random forest. Our developed scoring function Δ_vina RF₂₀ , which employs 20 descriptors in addition to the AutoDock Vina score, can achieve superior performance in all power tests of both CASF-2013 and CASF-2007 benchmarks compared to classical scoring functions. The Δ_vina RF₂₀ scoring function and its code are freely available on the web at: https://www.nyu.edu/projects/yzhang/DeltaVina. © 2016 Wiley Periodicals, Inc.

Keywords: docking; machine learning; protein-ligand binding affinity; random forest; scoring function.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Ligands
Molecular Docking Simulation*
Proteins / chemistry*

Substances

Ligands
Proteins

Grants and funding

R01 GM079223/GM/NIGMS NIH HHS/United States