Protein structure prediction has reached revolutionary levels of accuracy on single structures, yet distributional modeling paradigms are needed to capture the conformational ensembles and flexibility that underlie biological function. Towards this goal, we develop EigenFold, a diffusion generative modeling framework for sampling a distribution of structures from a given protein sequence. We define a diffusion process that models the structure as a system of harmonic oscillators and which naturally induces a cascading-resolution generative process along the eigenmodes of the system. On recent CAMEO targets, EigenFold achieves a median TMScore of 0.84, while providing a more comprehensive picture of model uncertainty via the ensemble of sampled structures relative to existing methods. We then assess EigenFold's ability to model and predict conformational heterogeneity for fold-switching proteins and ligand-induced conformational change. Code is available at https://github.com/bjing2016/EigenFold.
翻译:蛋白质结构预测已经在单个结构方面取得了革命性的准确性,但需要分布建模范式来捕捉生物功能背后的构象集合和灵活性。为了实现这个目标,我们开发了EigenFold,一种基于扩散生成建模框架,用于从给定的蛋白质序列中对结构分布进行采样。我们定义了一种扩散过程,将结构建模为调制振荡器系统,并自然地沿着系统的特征模式引导迭代分辨率的生成过程。在最近的CAMEO目标中,EigenFold实现了中位数TMScore为0.84,同时相对于现有方法,提供了更全面的模型不确定性图像。然后,我们评估了EigenFold对于折叠开关蛋白质和配体诱导构象变化的建模和预测能力。代码可在https://github.com/bjing2016/EigenFold获得。