Using machine learning to predict the effects and consequences of mutations in proteins

Curr Opin Struct Biol. 2023 Feb:78:102518. doi: 10.1016/j.sbi.2022.102518. Epub 2023 Jan 3.

Abstract

Machine and deep learning approaches can leverage the increasingly available massive datasets of protein sequences, structures, and mutational effects to predict variants with improved fitness. Many different approaches are being developed, but systematic benchmarking studies indicate that even though the specifics of the machine learning algorithms matter, the more important constraint comes from the data availability and quality utilized during training. In cases where little experimental data are available, unsupervised and self-supervised pre-training with generic protein datasets can still perform well after subsequent refinement via hybrid or transfer learning approaches. Overall, recent progress in this field has been staggering, and machine learning approaches will likely play a major role in future breakthroughs in protein biochemistry and engineering.

Keywords: Convolutional neural network; Deep learning; Hybrid learning; Mutational effect; Protein engineering; Transformer.

Publication types

  • Review
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Machine Learning*
  • Mutation
  • Neural Networks, Computer*