De novo design of protein structure and function with RFdiffusion

Joseph L Watson; David Juergens; Nathaniel R Bennett; Brian L Trippe; Jason Yim; Helen E Eisenach; Woody Ahern; Andrew J Borst; Robert J Ragotte; Lukas F Milles; Basile I M Wicky; Nikita Hanikel; Samuel J Pellock; Alexis Courbet; William Sheffler; Jue Wang; Preetham Venkatesh; Isaac Sappington; Susana Vázquez Torres; Anna Lauko; Valentin De Bortoli; Emile Mathieu; Sergey Ovchinnikov; Regina Barzilay; Tommi S Jaakkola; Frank DiMaio; Minkyung Baek; David Baker

doi:10.1038/s41586-023-06415-8

De novo design of protein structure and function with RFdiffusion

Nature. 2023 Aug;620(7976):1089-1100. doi: 10.1038/s41586-023-06415-8. Epub 2023 Jul 11.

Authors

Joseph L Watson^#^{1

2}, David Juergens^#^{1

2

3}, Nathaniel R Bennett^#^{1

2

3}, Brian L Trippe^#^{2

4

5}, Jason Yim^#^{2

6}, Helen E Eisenach^#^{1

2}, Woody Ahern^#^{1

2

7}, Andrew J Borst^{1

2}, Robert J Ragotte^{1

2}, Lukas F Milles^{1

2}, Basile I M Wicky^{1

2}, Nikita Hanikel^{1

2}, Samuel J Pellock^{1

2}, Alexis Courbet^{1

2

8}, William Sheffler^{1

2}, Jue Wang^{1

2}, Preetham Venkatesh^{1

2

9}, Isaac Sappington^{1

2

9}, Susana Vázquez Torres^{1

2

9}, Anna Lauko^{1

2

9}, Valentin De Bortoli⁸, Emile Mathieu¹⁰, Sergey Ovchinnikov^{11

12}, Regina Barzilay⁶, Tommi S Jaakkola⁶, Frank DiMaio^{1

2}, Minkyung Baek¹³, David Baker^{14

15

16}

Affiliations

¹ Department of Biochemistry, University of Washington, Seattle, WA, USA.
² Institute for Protein Design, University of Washington, Seattle, WA, USA.
³ Graduate Program in Molecular Engineering, University of Washington, Seattle, WA, USA.
⁴ Columbia University, Department of Statistics, New York, NY, USA.
⁵ Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA.
⁶ Massachusetts Institute of Technology, Cambridge, MA, USA.
⁷ Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
⁸ National Centre for Scientific Research, École Normale Supérieure rue d'Ulm, Paris, France.
⁹ Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, WA, USA.
¹⁰ Department of Engineering, University of Cambridge, Cambridge, UK.
¹¹ Faculty of Applied Sciences, Harvard University, Cambridge, MA, USA.
¹² John Harvard Distinguished Science Fellowship, Harvard University, Cambridge, MA, USA.
¹³ School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
¹⁴ Department of Biochemistry, University of Washington, Seattle, WA, USA. [email protected].
¹⁵ Institute for Protein Design, University of Washington, Seattle, WA, USA. [email protected].
¹⁶ Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA. [email protected].

^# Contributed equally.

Abstract

There has been considerable recent progress in designing new proteins using deep-learning methods^1-9. Despite this progress, a general deep-learning framework for protein design that enables solution of a wide range of design challenges, including de novo binder design and design of higher-order symmetric architectures, has yet to be described. Diffusion models^10,11 have had considerable success in image and language generative modelling but limited success when applied to protein modelling, probably due to the complexity of protein backbone geometry and sequence-structure relationships. Here we show that by fine-tuning the RoseTTAFold structure prediction network on protein structure denoising tasks, we obtain a generative model of protein backbones that achieves outstanding performance on unconditional and topology-constrained protein monomer design, protein binder design, symmetric oligomer design, enzyme active site scaffolding and symmetric motif scaffolding for therapeutic and metal-binding protein design. We demonstrate the power and generality of the method, called RoseTTAFold diffusion (RFdiffusion), by experimentally characterizing the structures and functions of hundreds of designed symmetric assemblies, metal-binding proteins and protein binders. The accuracy of RFdiffusion is confirmed by the cryogenic electron microscopy structure of a designed binder in complex with influenza haemagglutinin that is nearly identical to the design model. In a manner analogous to networks that produce images from user-specified inputs, RFdiffusion enables the design of diverse functional proteins from simple molecular specifications.

MeSH terms

Catalytic Domain
Cryoelectron Microscopy
Deep Learning*
Hemagglutinin Glycoproteins, Influenza Virus / chemistry
Hemagglutinin Glycoproteins, Influenza Virus / metabolism
Hemagglutinin Glycoproteins, Influenza Virus / ultrastructure
Protein Binding
Proteins* / chemistry
Proteins* / metabolism
Proteins* / ultrastructure

Substances

Hemagglutinin Glycoproteins, Influenza Virus
Proteins

Abstract

MeSH terms

Substances

Grants and funding