Pretraining molecular representation models without labels is fundamental to various applications. Conventional methods mainly process 2D molecular graphs and focus solely on 2D tasks, making their pretrained models incapable of characterizing 3D geometry and thus defective for downstream 3D tasks. In this work, we tackle 3D molecular pretraining in a complete and novel sense. In particular, we first propose to adopt an equivariant energy-based model as the backbone for pretraining, which enjoys the merit of fulfilling the symmetry of 3D space. Then we develop a node-level pretraining loss for force prediction, where we further exploit the Riemann-Gaussian distribution to ensure the loss to be E(3)-invariant, enabling more robustness. Moreover, a graph-level noise scale prediction task is also leveraged to further promote the eventual performance. We evaluate our model pretrained from a large-scale 3D dataset GEOM-QM9 on two challenging 3D benchmarks: MD17 and QM9. The experimental results support the better efficacy of our method against current state-of-the-art pretraining approaches, and verify the validity of our design for each proposed component.
翻译:没有标签的分子预示模型是各种应用的基础。常规方法主要是处理 2D 分子图,仅侧重于2D 任务,使经过预先训练的模型无法定性3D几何学,因此对下游的3D任务有缺陷。在这项工作中,我们完全和新颖地处理3D分子预演。特别是,我们首先提议采用基于能源的等离异模型作为预演的骨干,这种模型具有达到3D空间对称的优点。然后,我们为部队预测开发了一个节点预培训损失,我们进一步利用Riemann-Gaussian的分布,以确保损失成为E(3)不易变,从而能够更加稳健。此外,还利用一个图形级的噪音尺度预测任务来进一步促进最终的性能。我们先从一个大型的3D数据集GEOM-QM9中,对两个具有挑战性的3D基准进行了训练。MD17 和QM9 实验结果支持了我们方法相对于目前状态的预想方法的更有效性,并核查我们提出的每项设计组成部分的有效性。