A common approach to solving tasks, such as node classification or link prediction, on a large network begins by learning a Euclidean embedding of the nodes of the network, from which regular machine learning methods can be applied. For unsupervised random walk methods such as DeepWalk and node2vec, adding a $\ell_2$ penalty on the embedding vectors to the loss leads to improved downstream task performance. In this paper we study the effects of this regularization and prove that, under exchangeability assumptions on the graph, it asymptotically leads to learning a nuclear-norm-type penalized graphon. In particular, the exact form of the penalty depends on the choice of subsampling method used within stochastic gradient descent to learn the embeddings. We also illustrate empirically that concatenating node covariates to $\ell_2$ regularized node2vec embeddings leads to comparable, if not superior, performance to methods which incorporate node covariates and the network structure in a non-linear manner.
翻译:在大型网络上解决诸如节点分类或链接预测等任务的共同方法,是从学习网络节点的欧化嵌入开始,从中可以应用常规机器学习方法。对于DeepWalk和 node2vec等不受监督的随机步行方法,在损失的嵌入矢量上加上了2美元罚款,导致下游任务性能的改善。在本文件中,我们研究了这种正规化的效果,并证明在图形的可交换性假设下,它无生机地导致学习核-北型惩罚图形。特别是,惩罚的确切形式取决于在静态梯度梯度下降中用于学习嵌入的亚抽样方法的选择。我们还从经验上说明,将节点的共变法配置为$\ell_2美元,正规化的诺德2vec 嵌入可以比较(如果不是优于)将节点和网络结构以非线性方式纳入的方法的性能。