Computational Modeling of Protein Stability: Quantitative Analysis Reveals Solutions to Pervasive Problems

Aron Broom; Kyle Trainor; Zachary Jacobi; Elizabeth M Meiering

doi:10.1016/j.str.2020.04.003

Computational Modeling of Protein Stability: Quantitative Analysis Reveals Solutions to Pervasive Problems

Structure. 2020 Jun 2;28(6):717-726.e3. doi: 10.1016/j.str.2020.04.003. Epub 2020 May 5.

Authors

Aron Broom¹, Kyle Trainor¹, Zachary Jacobi¹, Elizabeth M Meiering²

Affiliations

¹ University of Waterloo, Department of Chemistry, Waterloo, N2L 3G1, Canada.
² University of Waterloo, Department of Chemistry, Waterloo, N2L 3G1, Canada. Electronic address: [email protected].

PMID: 32375024
DOI: 10.1016/j.str.2020.04.003

Abstract

Accurate modeling of the effects of mutations on protein stability is central to understanding and controlling proteins in myriad natural and applied contexts. Here, we reveal through rigorous quantitative analysis that stability prediction tools often favor mutations that increase stability at the expense of solubility. Moreover, while these tools may accurately identify strongly destabilizing mutations, the experimental effect of mutations predicted to stabilize is actually near neutral on average. The commonly used "classification accuracy" metric obscures this reality; accordingly, we recommend performance measures, such as the Matthews correlation coefficient (MCC). We demonstrate that an absurdly simple machine-learning algorithm-a neural network of just two neurons-unexpectedly achieves high classification accuracy, but its inadequacies are revealed by a low MCC. Despite the above limitations, making multiple mutations markedly improves the prospects for achieving a stabilization target, and modest improvements in the precision of future tools may yield disproportionate gains.

Keywords: computational protein design; computational protein engineering; computational protein stability prediction; machine learning; point mutations; protein design; protein engineering; protein forcefields; protein solubility; protein stability.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Databases, Protein
Machine Learning
Models, Molecular
Mutation*
Protein Folding
Protein Stability
Proteins / chemistry*
Proteins / genetics

Substances

Proteins