The design of novel protein structures remains a challenge in protein engineering for applications across biomedicine and chemistry. In this line of work, a diffusion model over rigid bodies in 3D (referred to as frames) has shown success in generating novel, functional protein backbones that have not been observed in nature. However, there exists no principled methodological framework for diffusion on SE(3), the space of orientation preserving rigid motions in R3, that operates on frames and confers the group invariance. We address these shortcomings by developing theoretical foundations of SE(3) invariant diffusion models on multiple frames followed by a novel framework, FrameDiff, for learning the SE(3) equivariant score over multiple frames. We apply FrameDiff on monomer backbone generation and find it can generate designable monomers up to 500 amino acids without relying on a pretrained protein structure prediction network that has been integral to previous methods. We find our samples are capable of generalizing beyond any known protein structure.
翻译:新的蛋白质结构的设计仍然是用于生物医学和化学应用的蛋白质工程方面的一项挑战。在这一工作领域,3D(称为框架)硬体体的传播模型(称为3D(称为框架))在生成尚未观察到的新型功能性蛋白骨骨方面表现出成功,然而,在SE(3)上没有原则性传播方法框架,SE(3)上没有保存R3中僵化运动的定向空间,该SE(3)在框架上运行,使该群体变得无动于衷。我们通过在多个框架上开发SE(3)变异性扩散模型的理论基础,并随后开发新的框架Diff,以学习SE(3)在多个框架上的等异性分数。我们在单体骨干生成时应用了FramDiff,发现它可以产生最多500个可设计的单体基酸,而无需依赖以前方法所固有的经过预先训练的蛋白质结构预测网络。我们发现,我们的样本能够超越任何已知的蛋白质结构,从而解决这些缺陷。