深路:社会代表在线学习 (DeepWalk: Online Learning of Social Representations)

We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs. DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. We demonstrate DeepWalk's latent representations on several multi-label network classification tasks for social networks such as BlogCatalog, Flickr, and YouTube. Our results show that DeepWalk outperforms challenging baselines which are allowed a global view of the network, especially in the presence of missing information. DeepWalk's representations can provide $F_1$ scores up to 10% higher than competing methods when labeled data is sparse. In some experiments, DeepWalk's representations are able to outperform all baseline methods while using 60% less training data. DeepWalk is also scalable. It is an online learning algorithm which builds useful incremental results, and is trivially parallelizable. These qualities make it suitable for a broad class of real world applications such as network classification, and anomaly detection.

翻译：我们展示了“DeepWalk”的一种新颖方法,用于学习网络中脊椎的潜在表现。这些潜在表现在连续矢量空间的社会关系中,很容易被统计模型所利用。“DeepWalk”将语言建模和不受监督的特征学习(或深层学习)方面的最新进展从文字序列到图表序列中归纳出来。“DeepWalk”利用从短距离随机行走中获得的本地信息学习潜在表现,将行走视为等同的句子。我们展示了“DeepWalk”在诸如BlogCatalog、Flickr和YouTube等社会网络的多个多标签网络分类任务中的潜在表现。我们的结果表明,DeepWalk超越了具有挑战性的基线,这些基线允许对网络进行全球观察,特别是在缺少信息的情况下。“DeepWalk”的表述可以提供1美元,比标签数据稀少时的竞合方法高出10%。在一些实验中,DeepWalk的表述能够超越所有基线方法,同时使用60%的培训数据。“DeepWalk”也是可测量的。它是一种在线学习算法,它可以构建一个有用的渐进性,可以用来进行真实的升级和平行的分类。

相关内容

DeepWalk

关注 3

DeepWalk是最早提出的基于 Word2vec 的节点向量化模型。其主要思路，就是利用构造节点在网络上的随机游走路径，来模仿文本生成的过程，提供一个节点序列，然后用Skip-gram和Hierarchical Softmax模型对随机游走序列中每个局部窗口内的节点对进行概率建模，最大化随机游走序列的似然概率，并使用最终随机梯度下降学习参数。

【CVPR2020】在线深度聚类的无监督表示学习, Online Deep Clustering for Unsupervised Representation Learning

专知会员服务

69+阅读 · 2020年6月19日

GRAPH-BERT ：学习图表示只需要注意力，GRAPH-BERT : Only Attention is Needed for Learning Graph Representations

专知会员服务

78+阅读 · 2020年5月31日

【SIGMOD2020】稀疏数据半监督学习的分解图表示，Factorized Graph Representations for Semi-Supervised Learning from Sparse Data

专知会员服务

15+阅读 · 2020年3月7日