Random-walk based network embedding algorithms like DeepWalk and node2vec are widely used to obtain Euclidean representation of the nodes in a network prior to performing downstream inference tasks. However, despite their impressive empirical performance, there is a lack of theoretical results explaining their large-sample behavior. In this paper, we study node2vec and DeepWalk through the perspective of matrix factorization. In particular, we analyze these algorithms in the setting of community detection for stochastic blockmodel graphs (and their degree-corrected variants). By exploiting the row-wise uniform perturbation bound for leading singular vectors, we derive high-probability error bounds between the matrix factorization-based node2vec/DeepWalk embeddings and their true counterparts, uniformly over all node embeddings. Based on strong concentration results, we further show the perfect membership recovery by node2vec/DeepWalk, followed by $K$-means/medians algorithms. Specifically, as the network becomes sparser, our results guarantee that with large enough window size and vertices number, applying $K$-means/medians on the matrix factorization-based node2vec embeddings can, with high probability, correctly recover the memberships of all vertices in a network generated from the stochastic blockmodel (or its degree-corrected variants). The theoretical justifications are mirrored in the numerical experiments and real data applications, for both the original node2vec and its matrix factorization variant.
翻译:以随机行为基础的网络嵌入算法, 如 DeepWalk 和 node2vec 和 node2vec 等已被广泛用于获取 Euclide 在下游测谎任务之前, 在网络中显示节点的 Euclide 代表 。 然而,尽管它们的实验性表现令人印象深刻, 却缺乏理论性结果来解释其大量抽样行为。 在本文中, 我们从矩阵的系数化角度, 研究节点2vec 和 DecrewWalk 等嵌入的算法。 特别是, 我们分析这些算法, 在为随机的区块模型图形( 及其经度修正的变异体) 设置社区检测器( 及其经度修正的变异体 ) 。 具体地说, 通过对正向的正向统一的正向一致的矩阵化, 我们得出高概率差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差