A common approach to solving prediction tasks on large networks, such as node classification or link prediction, begin by learning a Euclidean embedding of the nodes of the network, from which traditional machine learning methods can then be applied. This includes methods such as DeepWalk and node2vec, which learn embeddings by optimizing stochastic losses formed over subsamples of the graph at each iteration of stochastic gradient descent. In this paper, we study the effects of adding an $\ell_2$ penalty of the embedding vectors to the training loss of these types of methods. We prove that, under some exchangeability assumptions on the graph, this asymptotically leads to learning a graphon with a nuclear-norm-type penalty, and give guarantees for the asymptotic distribution of the learned embedding vectors. In particular, the exact form of the penalty depends on the choice of subsampling method used as part of stochastic gradient descent. We also illustrate empirically that concatenating node covariates to $\ell_2$ regularized node2vec embeddings leads to comparable, when not superior, performance to methods which incorporate node covariates and the network structure in a non-linear manner.
翻译:解决大型网络的预测任务的常见方法,例如节点分类或链接预测,首先从学习网络节点的Euclidean嵌入开始,从中可以应用传统的机器学习方法。这包括DeepWalk 和 node2vec等方法,通过优化在每个随机梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度的亚样本中形成的随机性损失来学习嵌入。在本文中,我们研究在培训损失这些类型的方法时增加嵌入矢量的罚款$@ell_2美元的影响。我们证明,在图形上的一些互换性假设下,这无症状导致学习带有核-诺尔克型罚款的图形,为所学的嵌入矢量矢量细图的无干扰分布提供保证。特别是,惩罚的确切形式取决于选择用作粘嵌入梯度梯度梯度梯度梯度梯度的亚抽样方法。我们还从经验中说明了将节点变法调为$_2$@2$@2$democtal develyal development acil development develoption development develoption degil