The recent emergence of contrastive learning approaches facilitates the research on graph representation learning (GRL), introducing graph contrastive learning (GCL) into the literature. These methods contrast semantically similar and dissimilar sample pairs to encode the semantics into node or graph embeddings. However, most existing works only performed model-level evaluation, and did not explore the combination space of modules for more comprehensive and systematic studies. For effective module-level evaluation, we propose a framework that decomposes GCL models into four modules: (1) a sampler to generate anchor, positive and negative data samples (nodes or graphs); (2) an encoder and a readout function to get sample embeddings; (3) a discriminator to score each sample pair (anchor-positive and anchor-negative); and (4) an estimator to define the loss function. Based on this framework, we conduct controlled experiments over a wide range of architectural designs and hyperparameter settings on node and graph classification tasks. Specifically, we manage to quantify the impact of a single module, investigate the interaction between modules, and compare the overall performance with current model architectures. Our key findings include a set of module-level guidelines for GCL, e.g., simple samplers from LINE and DeepWalk are strong and robust; an MLP encoder associated with Sum readout could achieve competitive performance on graph classification. Finally, we release our implementations and results as OpenGCL, a modularized toolkit that allows convenient reproduction, standard model and module evaluation, and easy extension.
翻译:最近的对比式学习方法的出现促进了图表代表性学习(GRL)的研究,将图表对比性学习(GCL)引入了文献中的图表对比学习(GCL),这些方法与将语义相仿和不同样的样本对齐以将语义编码成节点或图形嵌入器,然而,大多数现有作品只进行了模型级评价,没有探索模块级更全面和系统研究的模块组合空间。对于有效的模块级评价,我们建议了一个框架,将GCL模型分解成四个模块:(1)一个取样器,以生成锚、正和负数据样本(节点或图表);(2)一个编码器和读出功能,以获取样本嵌入;(3)一个导师,以对每组样本进行评分(锚式和锚式嵌入器),但大多数现有工程只进行了模型级评价,而没有探索模块的组合空间。根据这个框架,我们可以对一系列广泛的建筑设计以及节点和图形化的超参数模型进行有控制的实验。具体地说,我们设法量化单一模块的影响,调查模块之间的相互作用,并将总体业绩与当前模型结构结构结构结构结构的精度和模型的精度分析。 我们的关键发现包括了一种高级的成绩,我们读的模型的模型的精度、高级的精度、高级的精度的精度、高级的精度、高级的精度的精度、高级的精度的精度的精度的精度的精度的精度的精度的精度的精度、M。