We study a fundamental problem in computational chemistry known as molecular conformation generation, trying to predict stable 3D structures from 2D molecular graphs. Existing machine learning approaches usually first predict distances between atoms and then generate a 3D structure satisfying the distances, where noise in predicted distances may induce extra errors during 3D coordinate generation. Inspired by the traditional force field methods for molecular dynamics simulation, in this paper, we propose a novel approach called ConfGF by directly estimating the gradient fields of the log density of atomic coordinates. The estimated gradient fields allow directly generating stable conformations via Langevin dynamics. However, the problem is very challenging as the gradient fields are roto-translation equivariant. We notice that estimating the gradient fields of atomic coordinates can be translated to estimating the gradient fields of interatomic distances, and hence develop a novel algorithm based on recent score-based generative models to effectively estimate these gradients. Experimental results across multiple tasks show that ConfGF outperforms previous state-of-the-art baselines by a significant margin.
翻译:我们研究计算化学中的一个根本问题,即分子相容生成,试图从 2D 分子图中预测稳定的 3D 结构。 现有的机器学习方法通常首先预测原子之间的距离,然后产生一个满足距离的 3D 结构,预测距离的噪音可能会在 3D 协调生成过程中引起额外的错误。 在3D 协调生成过程中受到传统的分子动态模拟力场方法的启发下,我们在本论文中提出一种叫做 孔径法的新颖方法,直接估计原子坐标的日志密度的梯度。 估计的梯度字段允许通过兰格文动力直接产生稳定的相容。 然而,由于梯度字段是旋转变异的,这个问题非常具有挑战性。 我们注意到,估算原子坐标的梯度区域可以被转换为估计间距梯度的梯度区域,从而根据最近基于分数的基因模型开发新的算法,以有效估计这些梯度。 多项任务的实验结果显示, 孔径宽差比以前的状态基线大得多。