Unlike vision and language data which usually has a unique format, molecules can naturally be characterized using different chemical formulations. One can view a molecule as a 2D graph or define it as a collection of atoms located in a 3D space. For molecular representation learning, most previous works designed neural networks only for a particular data format, making the learned models likely to fail for other data formats. We believe a general-purpose neural network model for chemistry should be able to handle molecular tasks across data modalities. To achieve this goal, in this work, we develop a novel Transformer-based Molecular model called Transformer-M, which can take molecular data of 2D or 3D formats as input and generate meaningful semantic representations. Using the standard Transformer as the backbone architecture, Transformer-M develops two separated channels to encode 2D and 3D structural information and incorporate them with the atom features in the network modules. When the input data is in a particular format, the corresponding channel will be activated, and the other will be disabled. By training on 2D and 3D molecular data with properly designed supervised signals, Transformer-M automatically learns to leverage knowledge from different data modalities and correctly capture the representations. We conducted extensive experiments for Transformer-M. All empirical results show that Transformer-M can simultaneously achieve strong performance on 2D and 3D tasks, suggesting its broad applicability. The code and models will be made publicly available at https://github.com/lsj2408/Transformer-M.
翻译:与通常具有独特格式的视觉和语言数据不同,分子可以自然地使用不同的化学配方来描述不同格式的视觉和语言数据。可以将分子视为2D图形或将其定义为3D空间内原子的集合。对于分子代表学习,大多数以前的工作只为特定数据格式设计神经网络,使学习到的模型有可能无法使用其他数据格式。我们认为,化学的通用神经网络模型应该能够处理不同数据模式的分子任务。为了实现这一目标,我们将开发一个新型的以2D-M为基础的分子模型,称为变压器-M,该模型可以将2D或3D格式的分子数据作为输入,并产生有意义的语义表达。对于使用标准变压器作为主干结构,变压器-M开发两个分离的频道,以编码2D结构信息格式为编码。我们认为,当输入数据为特定格式时,将激活相应的频道,而另一个则将禁用。通过对2D和3D分子数据数据应用性能进行培训,同时使用2D格式的分子数据数据数据数据数据,变压-M将自动地显示各种变压M的模型。在公开模型上,可以正确使用各种变压的模型,可以使变压M 和变压的模型在不同的模型上进行。