Construction of a scaffold structure that supports a desired motif, conferring protein function, shows promise for the design of vaccines and enzymes. But a general solution to this motif-scaffolding problem remains open. Current machine-learning techniques for scaffold design are either limited to unrealistically small scaffolds (up to length 20) or struggle to produce multiple diverse scaffolds. We propose to learn a distribution over diverse and longer protein backbone structures via an E(3)-equivariant graph neural network. We develop SMCDiff to efficiently sample scaffolds from this distribution conditioned on a given motif; our algorithm is the first to theoretically guarantee conditional samples from a diffusion model in the large-compute limit. We evaluate our designed backbones by how well they align with AlphaFold2-predicted structures. We show that our method can (1) sample scaffolds up to 80 residues and (2) achieve structurally diverse scaffolds for a fixed motif.
翻译:建造支持理想的模子、 赋予蛋白质功能的脚手架结构, 展示出设计疫苗和酶的希望。 但是, 对这个模子- 脚手架问题的一般解决方案仍然开放。 目前脚手架设计的机器学习技术要么局限于不切实际的小型脚架( 最长为20长), 要么 努力生产多种不同的脚架。 我们提议通过 E(3) 等离子式图形神经网络, 学习不同和更长蛋白质骨架结构的分布。 我们开发了 SMCDiff, 以便高效地从这种配方中提取以给定的模件为条件的样本; 我们的算法是首先从理论上保证有条件的样本来自大构件极限的传播模型。 我们用它们与阿尔法Fold2- 2- 预设结构的精确度来评估我们设计的骨架。 我们显示我们的方法可以(1) 样板脚架到80个残渣, 并且(2) 实现固定的模结构多样化的脚架。