This paper studies unsupervised/self-supervised whole-graph representation learning, which is critical in many tasks such as molecule properties prediction in drug and material discovery. Existing methods mainly focus on preserving the local similarity structure between different graph instances but fail to discover the global semantic structure of the entire data set. In this paper, we propose a unified framework called Local-instance and Global-semantic Learning (GraphLoG) for self-supervised whole-graph representation learning. Specifically, besides preserving the local similarities, GraphLoG introduces the hierarchical prototypes to capture the global semantic clusters. An efficient online expectation-maximization (EM) algorithm is further developed for learning the model. We evaluate GraphLoG by pre-training it on massive unlabeled graphs followed by fine-tuning on downstream tasks. Extensive experiments on both chemical and biological benchmark data sets demonstrate the effectiveness of the proposed approach.
翻译:本文研究未经监督/自我监督的全面代表性学习,这在许多任务中至关重要,如药物和材料发现中的分子特性预测等。现有方法主要侧重于维护不同图表实例之间的当地相似结构,但未能发现整个数据集的全球语义结构。在本文件中,我们提议了一个统一框架,称为“地方干预和全球语义学习(Graph LooG)”,用于自我监督的整体代表性学习。具体地说,除了保存当地相似之处外,Greaph LooG还引入了等级原型,以捕捉全球语义分类群。为学习模型,进一步开发了高效的在线预期-最大化算法。我们通过对大型无标签图进行预先培训,然后对下游任务进行微调,对“Greag LoG”进行了评估。关于化学和生物基准数据集的广泛实验显示了拟议方法的有效性。