Computational Modeling of Protein Stability: Quantitative Analysis Reveals Solutions to Pervasive Problems

Structure. 2020 Jun 2;28(6):717-726.e3. doi: 10.1016/j.str.2020.04.003. Epub 2020 May 5.

Abstract

Accurate modeling of the effects of mutations on protein stability is central to understanding and controlling proteins in myriad natural and applied contexts. Here, we reveal through rigorous quantitative analysis that stability prediction tools often favor mutations that increase stability at the expense of solubility. Moreover, while these tools may accurately identify strongly destabilizing mutations, the experimental effect of mutations predicted to stabilize is actually near neutral on average. The commonly used "classification accuracy" metric obscures this reality; accordingly, we recommend performance measures, such as the Matthews correlation coefficient (MCC). We demonstrate that an absurdly simple machine-learning algorithm-a neural network of just two neurons-unexpectedly achieves high classification accuracy, but its inadequacies are revealed by a low MCC. Despite the above limitations, making multiple mutations markedly improves the prospects for achieving a stabilization target, and modest improvements in the precision of future tools may yield disproportionate gains.

Keywords: computational protein design; computational protein engineering; computational protein stability prediction; machine learning; point mutations; protein design; protein engineering; protein forcefields; protein solubility; protein stability.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Protein
  • Machine Learning
  • Models, Molecular
  • Mutation*
  • Protein Folding
  • Protein Stability
  • Proteins / chemistry*
  • Proteins / genetics

Substances

  • Proteins