Contrastive learning has recently achieved remarkable success in many domains including graphs. However contrastive loss, especially for graphs, requires a large number of negative samples which is unscalable and computationally prohibitive with a quadratic time complexity. Sub-sampling is not optimal and incorrect negative sampling leads to sampling bias. In this work, we propose a meta-node based approximation technique that can (a) proxy all negative combinations (b) in quadratic cluster size time complexity, (c) at graph level, not node level, and (d) exploit graph sparsity. By replacing node-pairs with additive cluster-pairs, we compute the negatives in cluster-time at graph level. The resulting Proxy approximated meta-node Contrastive (PamC) loss, based on simple optimized GPU operations, captures the full set of negatives, yet is efficient with a linear time complexity. By avoiding sampling, we effectively eliminate sample bias. We meet the criterion for larger number of samples, thus achieving block-contrastiveness, which is proven to outperform pair-wise losses. We use learnt soft cluster assignments for the meta-node constriction, and avoid possible heterophily and noise added during edge creation. Theoretically, we show that real world graphs easily satisfy conditions necessary for our approximation. Empirically, we show promising accuracy gains over state-of-the-art graph clustering on 6 benchmarks. Importantly, we gain substantially in efficiency; up to 3x in training time, 1.8x in inference time and over 5x in GPU memory reduction.
翻译:对比性学习最近在许多领域取得了显著的成功,包括图表。然而,对比性损失,特别是图表的损失,却需要大量的负面样本,这些样本无法缩放,而且计算上令人望而却步,具有四度时间复杂性。次抽样不是最佳的,不正确的负面抽样导致抽样偏差。在这项工作中,我们建议一种基于元点的近似技术,这种技术可以(a) 替代所有四级组群规模时间复杂性的负组合(b),(c) 以图形水平,而不是节点水平,以及(d) 利用图表的偏斜度。通过用添加的集束图模样取代节点,我们计算出在组群点时间上的负偏差,我们在图形水平上的负差是负差。在简单优化的 GPPU 操作的基础上,我们建议基于全方位的负数组合组合组合,但在线性复杂的时间复杂性下,我们有效地消除了样本偏差。我们满足了更多的样本标准,从而达到了整点的偏差性,这可以证明在图形上比对双向的精确度上,我们在图形的精确度上可以避免真实性、直径直径的精确度上的损失。我们从真实的轨道上,我们用软的轨道上看,我们从真实性地分析中可以理解地展示地研究。