Machine learning for functional protein design

Pascal Notin; Nathan Rollins; Yarin Gal; Chris Sander; Debora Marks

doi:10.1038/s41587-024-02127-0

Machine learning for functional protein design

Nat Biotechnol. 2024 Feb;42(2):216-228. doi: 10.1038/s41587-024-02127-0. Epub 2024 Feb 15.

Authors

Pascal Notin^#^{1

2}, Nathan Rollins^#³, Yarin Gal⁴, Chris Sander^{5

6}, Debora Marks^{7

8}

Affiliations

¹ Department of Systems Biology, Harvard Medical School, Boston, MA, USA. [email protected].
² Department of Computer Science, University of Oxford, Oxford, UK. [email protected].
³ Seismic Therapeutic, Cambridge, MA, USA. [email protected].
⁴ Department of Computer Science, University of Oxford, Oxford, UK.
⁵ Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
⁶ Broad Institute of Harvard and MIT, Cambridge, MA, USA.
⁷ Department of Systems Biology, Harvard Medical School, Boston, MA, USA. [email protected].
⁸ Broad Institute of Harvard and MIT, Cambridge, MA, USA. [email protected].

^# Contributed equally.

PMID: 38361074
DOI: 10.1038/s41587-024-02127-0

Abstract

Recent breakthroughs in AI coupled with the rapid accumulation of protein sequence and structure data have radically transformed computational protein design. New methods promise to escape the constraints of natural and laboratory evolution, accelerating the generation of proteins for applications in biotechnology and medicine. To make sense of the exploding diversity of machine learning approaches, we introduce a unifying framework that classifies models on the basis of their use of three core data modalities: sequences, structures and functional labels. We discuss the new capabilities and outstanding challenges for the practical design of enzymes, antibodies, vaccines, nanomachines and more. We then highlight trends shaping the future of this field, from large-scale assays to more robust benchmarks, multimodal foundation models, enhanced sampling strategies and laboratory automation.

Publication types

Review

MeSH terms

Amino Acid Sequence
Antibodies
Biotechnology
Machine Learning*
Proteins*

Substances

Proteins
Antibodies

Abstract

Publication types

MeSH terms

Substances

Grants and funding