Molecular representation learning contributes to multiple downstream tasks such as molecular property prediction and drug design. To properly represent molecules, graph contrastive learning is a promising paradigm as it utilizes self-supervision signals and has no requirements for human annotations. However, prior works fail to incorporate fundamental domain knowledge into graph semantics and thus ignore the correlations between atoms that have common attributes but are not directly connected by bonds. To address these issues, we construct a Chemical Element Knowledge Graph (KG) to summarize microscopic associations between elements and propose a novel Knowledge-enhanced Contrastive Learning (KCL) framework for molecular representation learning. KCL framework consists of three modules. The first module, knowledge-guided graph augmentation, augments the original molecular graph based on the Chemical Element KG. The second module, knowledge-aware graph representation, extracts molecular representations with a common graph encoder for the original molecular graph and a Knowledge-aware Message Passing Neural Network (KMPNN) to encode complex information in the augmented molecular graph. The final module is a contrastive objective, where we maximize agreement between these two views of molecular graphs. Extensive experiments demonstrated that KCL obtained superior performances against state-of-the-art baselines on eight molecular datasets. Visualization experiments properly interpret what KCL has learned from atoms and attributes in the augmented molecular graphs. Our codes and data are available in supplementary materials.
翻译:分子代表制学习有助于多项下游任务,例如分子属性预测和药物设计。为了适当代表分子,图形对比学习是一种有希望的模式,因为它使用自我监督信号,没有人类说明的要求。然而,以前的工程没有将基本领域知识纳入图形语义学,因而忽略了具有共同属性但没有通过债券直接连接的原子之间的相互关系。为了解决这些问题,我们建立了一个化学元素知识图(KG),以总结各元素之间的微观联系,并提议一个新的知识强化对比学习框架(KCL),用于分子代表性学习。KCL框架由三个模块组成。第一个模块,即知识引导图形增强,以化学元素Element KG为基础增加原始分子图。第二个模块,即知识认知图形代表,提取分子表,并配有共同的图形编码编码,用于原始分子图和知识认知信息传递神经网络(KMPNNN),以编码复杂的分子代表性信息。最后模块是一个对比性目的,即:我们从化学分子代表的分子模型中获取的更深层数据,我们在KCLFL的模型中展示了我们获得的两种状态上的数据。