After the recent ground-breaking advances in protein structure prediction, one of the remaining challenges in protein machine learning is to reliably predict distributions of structural states. Parametric models of small-scale fluctuations are difficult to fit due to complex covariance structures between degrees of freedom in the protein chain, often causing models to either violate local or global structural constraints. In this paper, we present a new strategy for modelling protein densities in internal coordinates, which uses constraints in 3D space to induce covariance structure between the internal degrees of freedom. We illustrate the potential of the procedure by constructing a variational autoencoder with full covariance output induced by the constraints implied by the conditional mean in 3D, and demonstrate that our approach makes it possible to scale density models of internal coordinates to full-size proteins.
翻译:在蛋白质结构预测最近取得突破性进展之后,蛋白质机器学习的剩余挑战之一是可靠地预测结构状态的分布。 小规模波动的参数模型由于蛋白质链自由度之间复杂的共变结构而难以适应,这往往导致模型违反当地或全球结构限制。 在本文中,我们提出了一个内部坐标蛋白密度建模新战略,其中利用3D空间的制约来诱发内部自由度之间的共变结构。我们通过建立一个具有3D条件平均值所隐含的制约因素所引发的完全共变的变形自动编码器来说明这一程序的潜力,并表明我们的方法使得内部坐标密度模型能够用于全尺寸的蛋白质。</s>