网络嵌入为矩阵因子化:统一深瓦、LINE、PTE和节点2vec (Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec)

Since the invention of word2vec, the skip-gram model has significantly advanced the research of network embedding, such as the recent emergence of the DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all of the aforementioned models with negative sampling can be unified into the matrix factorization framework with closed forms. Our analysis and proofs reveal that: (1) DeepWalk empirically produces a low-rank transformation of a network's normalized Laplacian matrix; (2) LINE, in theory, is a special case of DeepWalk when the size of vertices' context is set to one; (3) As an extension of LINE, PTE can be viewed as the joint factorization of multiple networks' Laplacians; (4) node2vec is factorizing a matrix related to the stationary distribution and transition probability tensor of a 2nd-order random walk. We further provide the theoretical connections between skip-gram based network embedding algorithms and the theory of graph Laplacian. Finally, we present the NetMF method as well as its approximation algorithm for computing network embedding. Our method offers significant improvements over DeepWalk and LINE for conventional network mining tasks. This work lays the theoretical foundation for skip-gram based network embedding methods, leading to a better understanding of latent network representation learning.

翻译：自发明单词2vec以来,跳格模型大大推进了网络嵌入的研究,例如最近出现了DeepWalk、LINE、PTE和 node2vec等网络嵌入方法。在这项工作中,我们显示,所有带有负面抽样的上述模型都可以以封闭形式统一到矩阵乘数化框架。我们的分析和证据表明:(1) DeepWalk经验性地产生了网络正常的Laplacecian矩阵的低级转换;(2) 在理论上,LINE是DeepWalk的一个特殊案例,因为顶部环境大小已设为一个;(3) 作为LINE的延伸,PTE可以被视为多个网络的LPlaccient;(4) Node2vec是将一个与固定分布和第二阶点随机行的过渡概率指数相关的矩阵因素化。我们进一步提供了基于跳格网络嵌入算算算算法和图形拉普拉西亚理论之间的理论联系。最后,我们介绍了网络MF方法,以及其接近性计算网络嵌入的深层代表制模型,我们提出了一种基于常规网络嵌入基础的模型基础的模型化方法。

相关内容

DeepWalk

关注 3

DeepWalk是最早提出的基于 Word2vec 的节点向量化模型。其主要思路，就是利用构造节点在网络上的随机游走路径，来模仿文本生成的过程，提供一个节点序列，然后用Skip-gram和Hierarchical Softmax模型对随机游走序列中每个局部窗口内的节点对进行概率建模，最大化随机游走序列的似然概率，并使用最终随机梯度下降学习参数。

【图神经网络(GNN)结构化数据分析】

专知会员服务

117+阅读 · 2020年3月22日

【WWW2020-MAGNN】异质图嵌入的集合图神经网络 MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding

专知会员服务

116+阅读 · 2020年2月10日

【图机器学习论文】网络嵌入研究综述（A Survey on Network Embedding）

专知会员服务

81+阅读 · 2019年12月16日

【图机器学习论文】综述：图嵌入技术、应用和性能（Graph Embedding Techniques, Applications, and Performance: A Survey）

专知会员服务

73+阅读 · 2019年12月16日