We introduce a conceptually simple yet effective model for self-supervised representation learning with graph data. It follows the previous methods that generate two views of an input graph through data augmentation. However, unlike contrastive methods that focus on instance-level discrimination, we optimize an innovative feature-level objective inspired by classical Canonical Correlation Analysis. Compared with other works, our approach requires none of the parameterized mutual information estimator, additional projector, asymmetric structures, and most importantly, negative samples which can be costly. We show that the new objective essentially 1) aims at discarding augmentation-variant information by learning invariant representations, and 2) can prevent degenerated solutions by decorrelating features in different dimensions. Our theoretical analysis further provides an understanding for the new objective which can be equivalently seen as an instantiation of the Information Bottleneck Principle under the self-supervised setting. Despite its simplicity, our method performs competitively on seven public graph datasets.
翻译:我们引入了一个概念上简单而有效的模型,用图表数据进行自我监督的演示学习。它遵循了以往通过数据扩增生成输入图形两种观点的方法。然而,与以实例层面歧视为重点的对比方法不同,我们优化了古典Canonic关联分析所启发的创新性特征层面目标。与其他作品相比,我们的方法不要求任何参数化的相互信息估计器、额外的投影仪、不对称结构,最重要的是,可能代价高昂的负面样本。我们表明,新目标(主要 ) 旨在通过学习变量表征来丢弃增量变量信息, 以及 ( ) 2 能够通过不同层面的解说性特征来防止退化的解决方案。 我们的理论分析进一步为新目标提供了理解,这些新目标可以被等同于自我监督环境下的信息瓶式原则的即时化。 尽管我们的方法很简单,但我们在七个公共图表数据集上进行了竞争。