Since the invention of word2vec, the skip-gram model has significantly advanced the research of network embedding, such as the recent emergence of the DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all of the aforementioned models with negative sampling can be unified into the matrix factorization framework with closed forms. Our analysis and proofs reveal that: (1) DeepWalk empirically produces a low-rank transformation of a network's normalized Laplacian matrix; (2) LINE, in theory, is a special case of DeepWalk when the size of vertices' context is set to one; (3) As an extension of LINE, PTE can be viewed as the joint factorization of multiple networks' Laplacians; (4) node2vec is factorizing a matrix related to the stationary distribution and transition probability tensor of a 2nd-order random walk. We further provide the theoretical connections between skip-gram based network embedding algorithms and the theory of graph Laplacian. Finally, we present the NetMF method as well as its approximation algorithm for computing network embedding. Our method offers significant improvements over DeepWalk and LINE for conventional network mining tasks. This work lays the theoretical foundation for skip-gram based network embedding methods, leading to a better understanding of latent network representation learning.
翻译:自发明单词2vec以来,跳格模型大大推进了网络嵌入的研究,例如最近出现了DeepWalk、LINE、PTE和 node2vec等网络嵌入方法。在这项工作中,我们显示,所有带有负面抽样的上述模型都可以以封闭形式统一到矩阵乘数化框架。我们的分析和证据表明:(1) DeepWalk经验性地产生了网络正常的Laplacecian矩阵的低级转换;(2) 在理论上,LINE是DeepWalk的一个特殊案例,因为顶部环境大小已设为一个;(3) 作为LINE的延伸,PTE可以被视为多个网络的LPlaccient;(4) Node2vec是将一个与固定分布和第二阶点随机行的过渡概率指数相关的矩阵因素化。我们进一步提供了基于跳格网络嵌入算算算算法和图形拉普拉西亚理论之间的理论联系。最后,我们介绍了网络MF方法,以及其接近性计算网络嵌入的深层代表制模型,我们提出了一种基于常规网络嵌入基础的模型基础的模型化方法。