MOCL:通过从分子图中学习知识意识差异性学习,以数据驱动的分子指纹 (MoCL: Data-driven Molecular Fingerprint via Knowledge-aware Contrastive Learning from Molecular Graph)

Recent years have seen a rapid growth of utilizing graph neural networks (GNNs) in the biomedical domain for tackling drug-related problems. However, like any other deep architectures, GNNs are data hungry. While requiring labels in real world is often expensive, pretraining GNNs in an unsupervised manner has been actively explored. Among them, graph contrastive learning, by maximizing the mutual information between paired graph augmentations, has been shown to be effective on various downstream tasks. However, the current graph contrastive learning framework has two limitations. First, the augmentations are designed for general graphs and thus may not be suitable or powerful enough for certain domains. Second, the contrastive scheme only learns representations that are invariant to local perturbations and thus does not consider the global structure of the dataset, which may also be useful for downstream tasks. Therefore, in this paper, we study graph contrastive learning in the context of biomedical domain, where molecular graphs are present. We propose a novel framework called MoCL, which utilizes domain knowledge at both local- and global-level to assist representation learning. The local-level domain knowledge guides the augmentation process such that variation is introduced without changing graph semantics. The global-level knowledge encodes the similarity information between graphs in the entire dataset and helps to learn representations with richer semantics. The entire model is learned through a double contrast objective. We evaluate MoCL on various molecular datasets under both linear and semi-supervised settings and results show that MoCL achieves state-of-the-art performance.

翻译：近些年来,生物医学领域利用图形神经网络(GNNs)处理药物相关问题的情况迅速增加。然而,与任何其他深层结构一样,GNNs的数据饥饿。虽然在现实世界中需要标签往往费用昂贵,但以不受监督的方式对GNNs进行预先培训,对此进行了积极探讨。其中,通过最大限度地利用配对图形增强之间的相互信息,图表对比学习在各种下游任务中显示出了有效性。然而,目前的图形对比学习框架有两个局限性。首先,扩展是针对一般图表设计的,因此对某些领域来说可能不够合适或强大。第二,对比性办法只学习对本地扰动不易的表达方式,因此没有考虑数据集的全球结构结构,这也可能对下游任务有用。因此,在本文中,我们研究了在生物医学领域(存在分子模型)范围内的对比性学习。我们建议了一个名为MCLU的新框架,它利用本地和全球两级的域知识来帮助学习某些领域。第二,比较性比较方案只学习对本地和全球一级的内程结构进行不易变。我们所了解的域域数据,通过比较性数据库显示这种变的轨道数据。我们所学的变的轨道数据,通过在图表中进行这种变。