在图表代表性学习中最大限度地实现聚合和分离:远程认识负抽样方法 (Maximizing Cohesion and Separation in Graph Representation Learning: A Distance-aware Negative Sampling Approach)

from arxiv, 14 pages, 9 figures, 3 tables, full length version with appendix; Published in Proceedings of the 2021 SIAM International Conference on Data Mining

The objective of unsupervised graph representation learning (GRL) is to learn a low-dimensional space of node embeddings that reflect the structure of a given unlabeled graph. Existing algorithms for this task rely on negative sampling objectives that maximize the similarity in node embeddings at nearby nodes (referred to as "cohesion") by maintaining positive and negative corpus of node pairs. While positive samples are drawn from node pairs that co-occur in short random walks, conventional approaches construct negative corpus by uniformly sampling random pairs, thus ignoring valuable information about structural dissimilarity among distant node pairs (referred to as "separation"). In this paper, we present a novel Distance-aware Negative Sampling (DNS) which maximizes the separation of distant node-pairs while maximizing cohesion at nearby node-pairs by setting the negative sampling probability proportional to the pair-wise shortest distances. Our approach can be used in conjunction with any GRL algorithm and we demonstrate the efficacy of our approach over baseline negative sampling methods over downstream node classification tasks on a number of benchmark datasets and GRL algorithms. All our codes and datasets are available at https://github.com/Distance-awareNS/DNS/.

翻译：未经监督的图形代表学习( GRL ) 的目标是学习一个反映给定的未贴标签的图表结构的节点嵌入的低维空间。任务的现有算法依赖于负抽样目标, 通过保持正和负的节点配对( 被称为“ 粘合 ” ) 最大限度地将节点嵌入附近的节点( 粘合) 的相似性植入为最大。正面的样本来自在短的随机行走中同时出现的节点对配对, 常规方法通过统一抽样随机对配对来构建负体, 从而忽略关于远节点对配( 被称为“ 分隔 ” ) 结构差异的宝贵信息。在本文中, 我们展示了一个新型的远程觉觉的阴性采样( DNS), 最大限度地分离遥远的节点对结点( 被称为“ 粘合粘合 ”), 同时通过将负的采样概率与对称最短的距离相称来最大化。我们的方法可以与任何GROL 算法一起使用, 我们展示了我们在下游点NSD 分类中的基准反采样方法的功效。所有基准数据/ Disqset/ dasbsals