Node embedding learns a low-dimensional representation for each node in the graph. Recent progress on node embedding shows that proximity matrix factorization methods gain superb performance and scale to large graphs with millions of nodes. Existing approaches first define a proximity matrix and then learn the embeddings that fit the proximity by matrix factorization. Most existing matrix factorization methods adopt the same proximity for different tasks, while it is observed that different tasks and datasets may require different proximity, limiting their representation power. Motivated by this, we propose {\em Lemane}, a framework with trainable proximity measures, which can be learned to best suit the datasets and tasks at hand automatically. Our method is end-to-end, which incorporates differentiable SVD in the pipeline so that the parameters can be trained via backpropagation. However, this learning process is still expensive on large graphs. To improve the scalability, we train proximity measures only on carefully subsampled graphs, and then apply standard proximity matrix factorization on the original graph using the learned proximity. Note that, computing the learned proximities for each pair is still expensive for large graphs, and existing techniques for computing proximities are not applicable to the learned proximities. Thus, we present generalized push techniques to make our solution scalable to large graphs with millions of nodes. Extensive experiments show that our proposed solution outperforms existing solutions on both link prediction and node classification tasks on almost all datasets.
翻译:节点嵌入会学习图中每个节点的低维代表。 节点嵌入最近的进展显示, 近距离矩阵化方法将获得超强性能和规模, 以百万节点的大型图形为主 。 现有方法首先定义了近距离矩阵, 然后学习了适合矩阵化因子化的嵌入。 大多数现有矩阵化方法对不同任务采用相同的近距离, 而人们发现, 不同的任务和数据集可能需要不同的接近度, 限制它们的代表力 。 基于此, 我们提议了 em Lemane}, 一个带有可训练的近距离措施的框架, 它可以自动学习, 以最符合数据集和任务 。 我们的方法是端对端到端, 从而在管道中包含不同的 SVD 嵌入, 从而可以通过反向重新配置来训练参数。 然而, 这种学习过程在大图表上仍然很昂贵。 为了改进缩略微缩略图, 我们只对近距离图应用的标准接近矩阵化矩阵, 然后在原始图表上应用标准的链接化矩阵 。 注意, 将所学到的解的解解的解解式的解的路径和模型显示我们所学的粗化方法都是昂贵的粗化的方法。 。 。 我们的模型中的粗化的粗化方法仍然是昂贵的粗化的图。