Molecular conformation generation aims to generate three-dimensional coordinates of all the atoms in a molecule and is an important task in bioinformatics and pharmacology. Previous distance-based methods first predict interatomic distances and then generate conformations based on them, which could result in conflicting distances. In this work, we propose a method that directly predicts the coordinates of atoms. We design a dedicated loss function for conformation generation, which is invariant to roto-translation of coordinates of conformations and permutation of symmetric atoms in molecules. We further design a backbone model that stacks multiple blocks, where each block refines the conformation generated by its preceding block. Our method achieves state-of-the-art results on four public benchmarks: on small-scale GEOM-QM9 and GEOM-Drugs which have $200$K training data, we can improve the previous best matching score by $3.5\%$ and $28.9\%$; on large-scale GEOM-QM9 and GEOM-Drugs which have millions of training data, those two improvements are $47.1\%$ and $36.3\%$. This shows the effectiveness of our method and the great potential of the direct approach. Our code is released at \url{https://github.com/DirectMolecularConfGen/DMCG}.
翻译:分子相容生成旨在生成分子中所有原子的三维坐标,这是生物信息学和药理学方面的一项重要任务。以前以距离为基础的方法首先预测跨原子距离,然后根据这些距离产生符合性,可能导致相冲突距离。在这项工作中,我们建议一种直接预测原子坐标的方法。我们设计了一种专用的相容生成损失函数,它不易对分子中对称原子的符合性和变异性坐标进行旋转转换。我们进一步设计了一个主干模型,堆叠多个区块,其中每个区块改进了前区块产生的符合性。我们的方法在四个公共基准上取得了最先进的结果:小规模的GEOM-QM9和GEOM-Drugs,它们拥有200K美元的培训数据,我们可以改进以前的最匹配得分为350美元和28.9美元;大规模GEM-M9和GEOM-Drugs,它们拥有数百万的培训数据,每个区块将改进为47.1美元/GMQrrx。这两份方法显示了我们直接法的效用。