Accurate modeling of the effects of mutations on protein stability is central to understanding and controlling proteins in myriad natural and applied contexts. Here, we reveal through rigorous quantitative analysis that stability prediction tools often favor mutations that increase stability at the expense of solubility. Moreover, while these tools may accurately identify strongly destabilizing mutations, the experimental effect of mutations predicted to stabilize is actually near neutral on average. The commonly used "classification accuracy" metric obscures this reality; accordingly, we recommend performance measures, such as the Matthews correlation coefficient (MCC). We demonstrate that an absurdly simple machine-learning algorithm-a neural network of just two neurons-unexpectedly achieves high classification accuracy, but its inadequacies are revealed by a low MCC. Despite the above limitations, making multiple mutations markedly improves the prospects for achieving a stabilization target, and modest improvements in the precision of future tools may yield disproportionate gains.
Keywords: computational protein design; computational protein engineering; computational protein stability prediction; machine learning; point mutations; protein design; protein engineering; protein forcefields; protein solubility; protein stability.
Copyright © 2020 Elsevier Ltd. All rights reserved.