The ability to computationally generate novel yet physically foldable protein structures could lead to new biological discoveries and new treatments targeting yet incurable diseases. Despite recent advances in protein structure prediction, directly generating diverse, novel protein structures from neural networks remains difficult. In this work, we present a new diffusion-based generative model that designs protein backbone structures via a procedure that mirrors the native folding process. We describe protein backbone structure as a series of consecutive angles capturing the relative orientation of the constituent amino acid residues, and generate new structures by denoising from a random, unfolded state towards a stable folded structure. Not only does this mirror how proteins biologically twist into energetically favorable conformations, the inherent shift and rotational invariance of this representation crucially alleviates the need for complex equivariant networks. We train a denoising diffusion probabilistic model with a simple transformer backbone and demonstrate that our resulting model unconditionally generates highly realistic protein structures with complexity and structural patterns akin to those of naturally-occurring proteins. As a useful resource, we release the first open-source codebase and trained models for protein structure diffusion.
翻译:在这项工作中,我们提出了一种新的基于扩散的基因模型,通过反映本地折叠过程的程序设计蛋白基体结构。我们把蛋白质骨干结构描述为一系列连续角度,捕捉构成氨基酸残留物的相对取向,并通过从随机的、展示的状态向稳定的折叠结构剥除,产生新的结构。我们不仅照搬了蛋白质如何在生物上转动为强效的配方,而且这种代表体的内在变化和旋转变化也极大地缓解了对复杂的等化网络的需求。我们用简单的变形骨骼来训练一个分解扩散概率模型,并证明我们产生的模型无条件地产生非常现实的蛋白质结构,其复杂性和结构模式与自然生成的蛋白质结构类似。作为一个有用的资源,我们释放了第一个开源代码基础和经过培训的蛋白质结构扩散模型。