Transformers have become methods of choice in many applications thanks to their ability to represent complex interaction between elements. However, extending the Transformer architecture to non-sequential data such as molecules and enabling its training on small datasets remain a challenge. In this work, we introduce a Transformer-based architecture for molecule property prediction, which is able to capture the geometry of the molecule. We modify the classical positional encoder by an initial encoding of the molecule geometry, as well as a learned gated self-attention mechanism. We further suggest an augmentation scheme for molecular data capable of avoiding the overfitting induced by the overparameterized architecture. The proposed framework outperforms the state-of-the-art methods while being based on pure machine learning solely, i.e. the method does not incorporate domain knowledge from quantum chemistry and does not use extended geometric inputs beside the pairwise atomic distances.
翻译:由于能够代表各元素之间的复杂互动,变异器在许多应用中已成为选择方法。然而,将变异器结构扩展至分子等非序列数据并使其能够进行小数据集培训,这仍然是一个挑战。在这项工作中,我们引入了分子属性预测的变异器结构,它能够捕捉分子的几何。我们通过分子几何的初始编码以及一个有知识的封闭式自我注意机制对古典定位编码器进行了修改。我们进一步建议为分子数据设计一个扩增方案,能够避免过度参数化结构引起的过度配配制。拟议框架在仅以纯机器学习为基础的同时,超越了最先进的方法,即该方法不包含量子化学的域知识,也不在对称原子距离之外使用扩展的几何输入。