一个Transformer模型能够理解2D和3D分子数据 (One Transformer Can Understand Both 2D & 3D Molecular Data)

Unlike vision and language data which usually has a unique format, molecules can naturally be characterized using different chemical formulations. One can view a molecule as a 2D graph or define it as a collection of atoms located in a 3D space. For molecular representation learning, most previous works designed neural networks only for a particular data format, making the learned models likely to fail for other data formats. We believe a general-purpose neural network model for chemistry should be able to handle molecular tasks across data modalities. To achieve this goal, in this work, we develop a novel Transformer-based Molecular model called Transformer-M, which can take molecular data of 2D or 3D formats as input and generate meaningful semantic representations. Using the standard Transformer as the backbone architecture, Transformer-M develops two separated channels to encode 2D and 3D structural information and incorporate them with the atom features in the network modules. When the input data is in a particular format, the corresponding channel will be activated, and the other will be disabled. By training on 2D and 3D molecular data with properly designed supervised signals, Transformer-M automatically learns to leverage knowledge from different data modalities and correctly capture the representations. We conducted extensive experiments for Transformer-M. All empirical results show that Transformer-M can simultaneously achieve strong performance on 2D and 3D tasks, suggesting its broad applicability. The code and models will be made publicly available at https://github.com/lsj2408/Transformer-M.

翻译：不同于通常采用独特格式的视觉和语言数据，分子可以自然地使用不同的化学公式进行表征。人们可以将分子视为2D图形或定义为处于3D空间的原子集合。对于分子表示学习，大多数以前的工作仅为特定的数据格式设计神经网络，导致学习的模型可能无法处理其他的数据格式。我们认为，化学的通用神经网络模型应该能够处理跨越数据形式的分子任务。为了实现这个目标，本文提出了一种新颖的基于Transformer的分子模型，称为Transformer-M。它可以将2D或3D格式的分子数据作为输入，并生成有意义的语义表示。使用标准Transformer作为骨干架构，Transformer-M开发了两个分离的通道来编码2D和3D结构信息，并将其与网络模块中的原子特征相结合。当输入数据是特定格式时，相应的通道将被激活，而其他通道将被禁用。通过使用适当设计的监督信号在2D和3D分子数据上进行训练，Transformer-M自动学习从不同数据模式中利用知识并正确捕获表示。我们针对Transformer-M进行了广泛的实验。所有实证结果都表明，Transformer-M可以同时在2D和3D任务上获得强大的性能，表明其广泛的适用性。代码和模型将在https://github.com/lsj2408/Transformer-M上公开发布。