In recent years, artificial intelligence has played an important role on accelerating the whole process of drug discovery. Various of molecular representation schemes of different modals (e.g. textual sequence or graph) are developed. By digitally encoding them, different chemical information can be learned through corresponding network structures. Molecular graphs and Simplified Molecular Input Line Entry System (SMILES) are popular means for molecular representation learning in current. Previous works have done attempts by combining both of them to solve the problem of specific information loss in single-modal representation on various tasks. To further fusing such multi-modal imformation, the correspondence between learned chemical feature from different representation should be considered. To realize this, we propose a novel framework of molecular joint representation learning via Multi-Modal information of SMILES and molecular Graphs, called MMSG. We improve the self-attention mechanism by introducing bond level graph representation as attention bias in Transformer to reinforce feature correspondence between multi-modal information. We further propose a Bidirectional Message Communication Graph Neural Network (BMC GNN) to strengthen the information flow aggregated from graphs for further combination. Numerous experiments on public property prediction datasets have demonstrated the effectiveness of our model.
翻译:近年来,人工智能在加快药物发现整个过程方面发挥了重要作用,开发了不同模式(如文本序列或图形)的各种分子代表方案,通过数字编码,可以通过相应的网络结构学习不同的化学信息。分子图和简化分子输入线输入系统(SMILES)是当前分子代表学习的常用手段。以前的工作曾试图将二者结合起来,以解决在各种任务单一模式代表中的具体信息损失问题。为了进一步采用这种多模式化,应考虑不同模式(如文本序列或图)之间学习的化学特征之间的通信。为了实现这一点,我们提出了一个通过SMILES和分子图(称为MMSG)多模式信息进行分子联合学习的新框架。我们改进了自留机制,在变换器中引入债券水平图形代表作为关注偏差,以加强多模式信息之间的特征对应关系。我们进一步提议建立一个双向信息通信图神经网络(BMC GNNNN),以加强从图表中获取的信息流,从进一步的数据预测中汇总。