Pretraining molecular representation models without labels is fundamental to various applications. Conventional methods mainly process 2D molecular graphs and focus solely on 2D tasks, making their pretrained models incapable of characterizing 3D geometry and thus defective for downstream 3D tasks. In this work, we tackle 3D molecular pretraining in a complete and novel sense. In particular, we first propose to adopt an equivariant energy-based model as the backbone for pretraining, which enjoys the merits of fulfilling the symmetry of 3D space. Then we develop a node-level pretraining loss for force prediction, where we further exploit the Riemann-Gaussian distribution to ensure the loss to be E(3)-invariant, enabling more robustness. Moreover, a graph-level noise scale prediction task is also leveraged to further promote the eventual performance. We evaluate our model pretrained from a large-scale 3D dataset GEOM-QM9 on two challenging 3D benchmarks: MD17 and QM9. Experimental results demonstrate the efficacy of our method against current state-of-the-art pretraining approaches, and verify the validity of our design for each proposed component.
翻译:没有标签的分子预示模型是各种应用的基础。常规方法主要是处理 2D 分子图,仅侧重于2D 任务,使经过预先训练的模型无法定性3D几何,因此对下游的3D任务有缺陷。在这项工作中,我们完全和新颖地处理3D分子预演。特别是,我们首先提议采用一个基于能源的等离异模型,作为预演的骨干,该模型具有实现3D空间对称的优点。然后,我们为部队预测开发了一个节点预培训损失,我们在此进一步利用Riemann-Gausian的分布,以确保损失是E(3)不易变的,从而能够更加稳健。此外,还利用一个图形级噪音尺度的预测任务来进一步促进最终的性能。我们先从一个大型的3D数据集GEOM-QM9中,对两个具有挑战性的3D基准进行了训练。MD17和QM9.实验结果表明我们的方法对目前最先进的预想办法的有效性,并核查我们每个拟议设计组成部分的有效性。