De novo protein design with a denoising diffusion network independent of pretrained structure prediction models

Nat Methods. 2024 Nov;21(11):2107-2116. doi: 10.1038/s41592-024-02437-w. Epub 2024 Oct 9.

Abstract

The recent success of RFdiffusion, a method for protein structure design with a denoising diffusion probabilistic model, has relied on fine-tuning the RoseTTAFold structure prediction network for protein backbone denoising. Here, we introduce SCUBA-diffusion (SCUBA-D), a protein backbone denoising diffusion probabilistic model freshly trained by considering co-diffusion of sequence representation to enhance model regularization and adversarial losses to minimize data-out-of-distribution errors. While matching the performance of the pretrained RoseTTAFold-based RFdiffusion in generating experimentally realizable protein structures, SCUBA-D readily generates protein structures with not-yet-observed overall folds that are different from those predictable with RoseTTAFold. The accuracy of SCUBA-D was confirmed by the X-ray structures of 16 designed proteins and a protein complex, and by experiments validating designed heme-binding proteins and Ras-binding proteins. Our work shows that deep generative models of images or texts can be fruitfully extended to complex physical objects like protein structures by addressing outstanding issues such as the data-out-of-distribution errors.

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • Crystallography, X-Ray / methods
  • Models, Molecular
  • Models, Statistical
  • Protein Conformation
  • Protein Folding
  • Proteins* / chemistry

Substances

  • Proteins