Molecular dynamic simulations are important in computational physics, chemistry, material, and biology. Machine learning-based methods have shown strong abilities in predicting molecular energy and properties and are much faster than DFT calculations. Molecular energy is at least related to atoms, bonds, bond angles, torsion angles, and nonbonding atom pairs. Previous Transformer models only use atoms as inputs which lack explicit modeling of the aforementioned factors. To alleviate this limitation, we propose Moleformer, a novel Transformer architecture that takes nodes (atoms) and edges (bonds and nonbonding atom pairs) as inputs and models the interactions among them using rotational and translational invariant geometry-aware spatial encoding. Proposed spatial encoding calculates relative position information including distances and angles among nodes and edges. We benchmark Moleformer on OC20 and QM9 datasets, and our model achieves state-of-the-art on the initial state to relaxed energy prediction of OC20 and is very competitive in QM9 on predicting quantum chemical properties compared to other Transformer and Graph Neural Network (GNN) methods which proves the effectiveness of the proposed geometry-aware spatial encoding in Moleformer.
翻译:分子动态模拟在计算物理学、化学、材料和生物学中非常重要。 机器学习方法在预测分子能量和特性方面表现出很强的能力,而且比DFT计算速度快得多。 分子能量至少与原子、 债券、 债券角度、 氧化角度 以及非原子对等有关。 以前的变压器模型仅将原子用作投入,而其中缺乏上述因素的明确模型。 为了减轻这一限制,我们提议Moleforold, 这是一种新型的变异器结构,它采用节点(原子)和边缘(骨骼和非蛋白原子配对)作为投入和模型,用旋转和翻译的变异几何- 觉空间编码来模拟它们之间的互动。 拟议的空间编码计算相对位置信息,包括节点和边缘之间的距离和角度。 我们以OC20 和 QM9 数据集为模型基准,我们模型在对OC20 和 QM9 数据集的初始状态进行节点预测,对OC20 和边缘的能源预测,而在QM9 中,在预测数字系统模型中,对地基质化的精确特性进行测试。