Construction of a scaffold structure that supports a desired motif, conferring protein function, shows promise for the design of vaccines and enzymes. But a general solution to this motif-scaffolding problem remains open. Current machine-learning techniques for scaffold design are either limited to unrealistically small scaffolds (up to length 20) or struggle to produce multiple diverse scaffolds. We propose to learn a distribution over diverse and longer protein backbone structures via an E(3)-equivariant graph neural network. We develop SMCDiff to efficiently sample scaffolds from this distribution conditioned on a given motif; our algorithm is the first to theoretically guarantee conditional samples from a diffusion model in the large-compute limit. We evaluate our designed backbones by how well they align with AlphaFold2-predicted structures. We show that our method can (1) sample scaffolds up to 80 residues and (2) achieve structurally diverse scaffolds for a fixed motif.
翻译:制作支撑所需模体的支架结构被证明对疫苗和酶的设计非常有前途。但目前还没有一个通用的方法来解决这个模体支架问题。目前的机器学习技术只能处理长度不超过20的支架结构,或者难以产生多样化的支架结构。本文建议利用 E(3) - 等变图神经网络来学习多样化、更长的蛋白质背骨结构的分布。我们开发了SMCDiff来有效地从该分布中对给定模体进行条件采样。我们的算法是第一个能够从扩散模型中理论上保证在大计算限制下的条件采样的算法。我们通过与AlphaFold2预测结构的对准程度来评估我们的设计支架。我们展示了我们的方法可以(1)采样长达80个氨基酸残基的支架结构,(2)对固定的模体实现结构上多样化的支架。