Self-supervised Learning (SSL) aims at learning representations of objects without relying on manual labeling. Recently, a number of SSL methods for graph representation learning have achieved performance comparable to SOTA semi-supervised GNNs. A Siamese network, which relies on data augmentation, is the popular architecture used in these methods. However, these methods rely on heuristically crafted data augmentation techniques. Furthermore, they use either contrastive terms or other tricks (e.g., asymmetry) to avoid trivial solutions that can occur in Siamese networks. In this study, we propose, GraphSurgeon, a novel SSL method for GNNs with the following features. First, instead of heuristics we propose a learnable data augmentation method that is jointly learned with the embeddings by leveraging the inherent signal encoded in the graph. In addition, we take advantage of the flexibility of the learnable data augmentation and introduce a new strategy that augments in the embedding space, called post augmentation. This strategy has a significantly lower memory overhead and run-time cost. Second, as it is difficult to sample truly contrastive terms, we avoid explicit negative sampling. Third, instead of relying on engineering tricks, we use a scalable constrained optimization objective motivated by Laplacian Eigenmaps to avoid trivial solutions. To validate the practical use of GraphSurgeon, we perform empirical evaluation using 14 public datasets across a number of domains and ranging from small to large scale graphs with hundreds of millions of edges. Our finding shows that GraphSurgeon is comparable to six SOTA semi-supervised and on par with five SOTA self-supervised baselines in node classification tasks. The source code is available at https://github.com/zekarias-tilahun/graph-surgeon.
翻译:自我监督的学习(SSL) 旨在学习对象的表达方式而不必依靠手动标签。 最近, 一些 SSL 的图形代表学习方法取得了与 SONTA 半监督GNNs 类似的效绩。 依靠数据增强的Siamsese 网络是这些方法中使用的流行结构。 但是, 这些方法依靠的是超自然的编造的数据增强技术。 此外, 它们使用对比性术语或其他技巧( 例如不对称) 来避免在Siames网络中出现微不足道的解决方案。 在本研究中, 我们提议, 石英公司, 一种具有以下特性的新型 GNNS 的 SL 方法。 首先, 我们建议一种可学习的数据增强方法, 依靠数据增强的内在信号, 并且利用可复制的数据增强的灵活度。 此外, 我们利用一个可存储空间的新的策略, 称为事后增强。 这个策略的存储和运行时间成本要低得多。 其次, 很难从真实的Sloverial Strial Teral Serviews 上, 我们不使用一个直观的直观的精确的图像分析工具, 我们不使用一个直观的Sexal- descrial servidustralal realal 。