SimGRACE: 一种无需数据增强的图对比学习简单框架 (SimGRACE: A Simple Framework for Graph Contrastive Learning without Data Augmentation)

Graph contrastive learning (GCL) has emerged as a dominant technique for graph representation learning which maximizes the mutual information between paired graph augmentations that share the same semantics. Unfortunately, it is difficult to preserve semantics well during augmentations in view of the diverse nature of graph data. Currently, data augmentations in GCL that are designed to preserve semantics broadly fall into three unsatisfactory ways. First, the augmentations can be manually picked per dataset by trial-and-errors. Second, the augmentations can be selected via cumbersome search. Third, the augmentations can be obtained by introducing expensive domain-specific knowledge as guidance. All of these limit the efficiency and more general applicability of existing GCL methods. To circumvent these crucial issues, we propose a \underline{Sim}ple framework for \underline{GRA}ph \underline{C}ontrastive l\underline{E}arning, \textbf{SimGRACE} for brevity, which does not require data augmentations. Specifically, we take original graph as input and GNN model with its perturbed version as two encoders to obtain two correlated views for contrast. SimGRACE is inspired by the observation that graph data can preserve their semantics well during encoder perturbations while not requiring manual trial-and-errors, cumbersome search or expensive domain knowledge for augmentations selection. Also, we explain why SimGRACE can succeed. Furthermore, we devise adversarial training scheme, dubbed \textbf{AT-SimGRACE}, to enhance the robustness of graph contrastive learning and theoretically explain the reasons. Albeit simple, we show that SimGRACE can yield competitive or better performance compared with state-of-the-art methods in terms of generalizability, transferability and robustness, while enjoying unprecedented degree of flexibility and efficiency.

翻译：图对比学习（GCL）已成为图表示学习的主要技术之一，通过最大化共享相同语义的成对图增量之间的互信息来实现。然而，鉴于图数据的多样性，增强期间很难很好地保留语义。目前，用于保留语义的数据增强在 GCL 中通常大致可以分为三种不令人满意的方式。首先，可以通过试错方法手动选定每个数据集的增强方法。其次，可以通过冗长的搜索选择增强方法。第三，可以通过引入昂贵的领域特定知识作为指导来获得增强方法。所有这些都限制了现有 GCL 方法的效率和更一般的适用性。为了解决这些关键问题，我们提出了一种名为 \underline{S}im\underline{GRA}ph \underline{C}ontrastive l\underline{E}arning 的 \textbf{SimGRACE} 简单框架，无需数据增强。具体而言，我们将原始图形作为输入，并将具有其扰动版本的 GNN 模型作为两个编码器，从而获得对比的两个相关视图。SimGRACE 的灵感来自于这样一个观察结论，即在编码器扰动期间图形数据可以很好地保留其语义，同时不需要手动试错、冗长搜索或昂贵的领域知识来选择增强方法。我们还解释了为什么 SimGRACE 可以成功。此外，我们设计了敌对训练方案 AT-SimGRACE，以提高图对比学习的鲁棒性，并在理论上解释了原因。尽管简单，我们展示了 SimGRACE 可以在灵活性和效率上具备前所未有的程度的情况下，在通用性、可转移性和鲁棒性方面产生与现有最先进方法相竞争性的结果。