Machine learning has become a promising approach for molecular modeling. Positional quantities, such as interatomic distances and bond angles, play a crucial role in molecule physics. The existing works rely on careful manual design of their representation. To model the complex nonlinearity in predicting molecular properties in an more end-to-end approach, we propose to encode the positional quantities with a learnable embedding that is continuous and differentiable. A regularization technique is employed to encourage embedding smoothness along the physical dimension. We experiment with a variety of molecular property and force field prediction tasks. Improved performance is observed for three different model architectures after plugging in the proposed positional encoding method. In addition, the learned positional encoding allows easier physics-based interpretation. We observe that tasks of similar physics have the similar learned positional encoding.
翻译:分子模型学已成为一个很有希望的分子模型学方法。 分子物理中,定位数量,如间距离和连接角度,具有关键作用。 现有作品依靠仔细的手工设计其代表面。 为了模拟在更端到端预测分子特性方面的复杂非线性,我们提议用连续和不同的可学习嵌入方式编码位置数量。 使用正规化技术鼓励将光滑融入物理层面。 我们实验各种分子属性和力场预测任务。 在插入拟议位置编码方法之后,观察到三种不同的模型结构的性能得到改善。 此外,所学的位置编码使得以物理为基础的解释更加容易。 我们观察到类似物理学的任务具有类似的学习位置编码。