Protein design using structure-based residue preferences

David Ding; Ada Y Shaw; Sam Sinai; Nathan Rollins; Noam Prywes; David F Savage; Michael T Laub; Debora S Marks

doi:10.1038/s41467-024-45621-4

Protein design using structure-based residue preferences

Nat Commun. 2024 Feb 22;15(1):1639. doi: 10.1038/s41467-024-45621-4.

Authors

David Ding¹, Ada Y Shaw², Sam Sinai³, Nathan Rollins⁴, Noam Prywes⁵, David F Savage^{5

6

7}, Michael T Laub^{8

9}, Debora S Marks¹⁰

Affiliations

¹ Innovative Genomics Institute, University of California, Berkeley, CA, 94720, USA. [email protected].
² Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA.
³ Dyno Therapeutics, Watertown, MA, 02472, USA.
⁴ Seismic Therapeutics, Lab Central, Cambridge, MA, 02142, USA.
⁵ Innovative Genomics Institute, University of California, Berkeley, CA, 94720, USA.
⁶ Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
⁷ Howard Hughes Medical Institute, University of California, Berkeley, CA, 94720, USA.
⁸ Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
⁹ Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
¹⁰ Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA. [email protected].

Abstract

Recent developments in protein design rely on large neural networks with up to 100s of millions of parameters, yet it is unclear which residue dependencies are critical for determining protein function. Here, we show that amino acid preferences at individual residues-without accounting for mutation interactions-explain much and sometimes virtually all of the combinatorial mutation effects across 8 datasets (R² ~ 78-98%). Hence, few observations (~100 times the number of mutated residues) enable accurate prediction of held-out variant effects (Pearson r > 0.80). We hypothesized that the local structural contexts around a residue could be sufficient to predict mutation preferences, and develop an unsupervised approach termed CoVES (Combinatorial Variant Effects from Structure). Our results suggest that CoVES outperforms not just model-free methods but also similarly to complex models for creating functional and diverse protein variants. CoVES offers an effective alternative to complicated models for identifying functional protein mutations.

MeSH terms

Amino Acids / chemistry
Mutation
Neural Networks, Computer*
Proteins* / metabolism

Substances

Proteins
Amino Acids

Grants and funding

R01 CA260415/CA/NCI NIH HHS/United States