Protease target prediction via matrix factorization

Bioinformatics. 2019 Mar 15;35(6):923-929. doi: 10.1093/bioinformatics/bty746.

Abstract

Motivation: Protein cleavage is an important cellular event, involved in a myriad of processes, from apoptosis to immune response. Bioinformatics provides in silico tools, such as machine learning-based models, to guide the discovery of targets for the proteases responsible for protein cleavage. State-of-the-art models have a scope limited to specific protease families (such as Caspases), and do not explicitly include biological or medical knowledge (such as the hierarchical protein domain similarity or gene-gene interactions). To fill this gap, we present a novel approach for protease target prediction based on data integration.

Results: By representing protease-protein target information in the form of relational matrices, we design a model (i) that is general and not limited to a single protease family, and (b) leverages on the available knowledge, managing extremely sparse data from heterogeneous data sources, including primary sequence, pathways, domains and interactions. When compared with other algorithms on test data, our approach provides a better performance even for models specifically focusing on a single protease family.

Availability and implementation: https://gitlab.com/smarini/MaDDA/ (Matlab code and utilized data.).

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Algorithms*
  • Computer Simulation
  • Machine Learning
  • Peptide Hydrolases
  • Software*

Substances

  • Peptide Hydrolases