Motivation: Fingerprints (FPs) are the most common small molecule representation in cheminformatics. There are a wide variety of FPs, and the Extended Connectivity Fingerprint (ECFP) is one of the best-suited for general applications. Despite the overall FP abundance, only a few FPs represent the 3D structure of the molecule, and hardly any encode protein-ligand interactions.
Results: Here, we present a Protein-Ligand Extended Connectivity (PLEC) FP that implicitly encodes protein-ligand interactions by pairing the ECFP environments from the ligand and the protein. PLEC FPs were used to construct different machine learning models tailored for predicting protein-ligand affinities (pKi∕d). Even the simplest linear model built on the PLEC FP achieved Rp = 0.817 on the Protein Databank (PDB) bind v2016 'core set', demonstrating its descriptive power.
Availability and implementation: The PLEC FP has been implemented in the Open Drug Discovery Toolkit (https://github.com/oddt/oddt).
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author(s) 2018. Published by Oxford University Press.