Combining machine learning with structure-based protein design to predict and engineer post-translational modifications of proteins

Moritz Ertelt; Vikram Khipple Mulligan; Jack B Maguire; Sergey Lyskov; Rocco Moretti; Torben Schiffner; Jens Meiler; Clara T Schoeder

doi:10.1371/journal.pcbi.1011939

Combining machine learning with structure-based protein design to predict and engineer post-translational modifications of proteins

PLoS Comput Biol. 2024 Mar 14;20(3):e1011939. doi: 10.1371/journal.pcbi.1011939. eCollection 2024 Mar.

Authors

Moritz Ertelt^{1

2}, Vikram Khipple Mulligan³, Jack B Maguire⁴, Sergey Lyskov⁵, Rocco Moretti^{6

7}, Torben Schiffner¹, Jens Meiler^{1

2

6

7}, Clara T Schoeder^{1

2}

Affiliations

¹ Institute for Drug Discovery, Leipzig University Medical Faculty, Leipzig, Germany.
² Center for Scalable Data Analytics and Artificial Intelligence ScaDS.AI, Dresden/Leipzig, Germany.
³ Center for Computational Biology, Flatiron Institute, New York, New York, United States of America.
⁴ Program in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America.
⁵ Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America.
⁶ Department of Chemistry, Vanderbilt University, Nashville, Tennessee, United States of America.
⁷ Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America.

Abstract

Post-translational modifications (PTMs) of proteins play a vital role in their function and stability. These modifications influence protein folding, signaling, protein-protein interactions, enzyme activity, binding affinity, aggregation, degradation, and much more. To date, over 400 types of PTMs have been described, representing chemical diversity well beyond the genetically encoded amino acids. Such modifications pose a challenge to the successful design of proteins, but also represent a major opportunity to diversify the protein engineering toolbox. To this end, we first trained artificial neural networks (ANNs) to predict eighteen of the most abundant PTMs, including protein glycosylation, phosphorylation, methylation, and deamidation. In a second step, these models were implemented inside the computational protein modeling suite Rosetta, which allows flexible combination with existing protocols to model the modified sites and understand their impact on protein stability as well as function. Lastly, we developed a new design protocol that either maximizes or minimizes the predicted probability of a particular site being modified. We find that this combination of ANN prediction and structure-based design can enable the modification of existing, as well as the introduction of novel, PTMs. The potential applications of our work include, but are not limited to, glycan masking of epitopes, strengthening protein-protein interactions through phosphorylation, as well as protecting proteins from deamidation liabilities. These applications are especially important for the design of new protein therapeutics where PTMs can drastically change the therapeutic properties of a protein. Our work adds novel tools to Rosetta's protein engineering toolbox that allow for the rational design of PTMs.

Copyright: © 2024 Ertelt et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MeSH terms

Glycosylation
Maschinelles Lernen
Phosphorylation
Protein Processing, Post-Translational*
Proteins* / chemistry

Substances

Proteins

Grants and funding

This work is supported through a Rosetta mini-grant under award number RC22021 from RosettaCommons (www.rosettacommons.org) held by CTS. ME, JM and CTS acknowledge the financial support by the Federal Ministry of Education and Research of Germany and by the Sächsische Staatsministerium für Wissenschaft Kultur und Tourismus in the program Center of Excellence for AI-research "Center for Scalable Data Analytics and Artificial Intelligence Dresden/Leipzig", project identification number: ScaDS.AI (https://scads.ai/). ME's position is funded through an award by ScaDS.AI. VKM is supported by the Simons Foundation (https://www.simonsfoundation.org/). TS is supported by a Sofja Kovalevskaja prize from the Alexander-von-Humboldt foundation (https://www.humboldt-foundation.de/), while JM is supported by an Alexander-von-Humboldt professorship. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.