EP-Pred: A Machine Learning Tool for Bioprospecting Promiscuous Ester Hydrolases

Biomolecules. 2022 Oct 21;12(10):1529. doi: 10.3390/biom12101529.

Abstract

When bioprospecting for novel industrial enzymes, substrate promiscuity is a desirable property that increases the reusability of the enzyme. Among industrial enzymes, ester hydrolases have great relevance for which the demand has not ceased to increase. However, the search for new substrate promiscuous ester hydrolases is not trivial since the mechanism behind this property is greatly influenced by the active site's structural and physicochemical characteristics. These characteristics must be computed from the 3D structure, which is rarely available and expensive to measure, hence the need for a method that can predict promiscuity from sequence alone. Here we report such a method called EP-pred, an ensemble binary classifier, that combines three machine learning algorithms: SVM, KNN, and a Linear model. EP-pred has been evaluated against the Lipase Engineering Database together with a hidden Markov approach leading to a final set of ten sequences predicted to encode promiscuous esterases. Experimental results confirmed the validity of our method since all ten proteins were found to exhibit a broad substrate ambiguity.

Keywords: biocatalysts; bioprospecting; esterases/lipases; hydrolases; machine learning; supervised learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bioprospecting*
  • Esterases* / chemistry
  • Esters
  • Lipase / chemistry
  • Machine Learning

Substances

  • Esterases
  • Lipase
  • Esters

Grants and funding

This study was conducted under the auspices of the FuturEnzyme and Oxipro Projects funded by the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No. 101000327 and 101000607. We also acknowledge financial support under Grants PID2020-112758RB-I00 (M.F.), PID2019-106370RB-I00 (V.G.) and PDC2021-121534-I00 (M.F.) and PID2019-106370RB-I00/AEI/10.13039/501100011033 (A.R.-M.), from the Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación (AEI) (Digital Object Identifier 10.13039/501100011033), Fondo Europeo de Desarrollo Regional (FEDER) and the European Union (“NextGen-erationEU/PRTR”), and Grant 2020AEP061 (M.F.) from the Agencia Estatal CSIC.