We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs. DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. We demonstrate DeepWalk's latent representations on several multi-label network classification tasks for social networks such as BlogCatalog, Flickr, and YouTube. Our results show that DeepWalk outperforms challenging baselines which are allowed a global view of the network, especially in the presence of missing information. DeepWalk's representations can provide $F_1$ scores up to 10% higher than competing methods when labeled data is sparse. In some experiments, DeepWalk's representations are able to outperform all baseline methods while using 60% less training data. DeepWalk is also scalable. It is an online learning algorithm which builds useful incremental results, and is trivially parallelizable. These qualities make it suitable for a broad class of real world applications such as network classification, and anomaly detection.
翻译:我们展示了“DeepWalk”的一种新颖方法,用于学习网络中脊椎的潜在表现。这些潜在表现在连续矢量空间的社会关系中,很容易被统计模型所利用。“DeepWalk”将语言建模和不受监督的特征学习(或深层学习)方面的最新进展从文字序列到图表序列中归纳出来。“DeepWalk”利用从短距离随机行走中获得的本地信息学习潜在表现,将行走视为等同的句子。我们展示了“DeepWalk”在诸如BlogCatalog、Flickr和YouTube等社会网络的多个多标签网络分类任务中的潜在表现。我们的结果表明,DeepWalk超越了具有挑战性的基线,这些基线允许对网络进行全球观察,特别是在缺少信息的情况下。“DeepWalk”的表述可以提供1美元,比标签数据稀少时的竞合方法高出10%。在一些实验中,DeepWalk的表述能够超越所有基线方法,同时使用60%的培训数据。“DeepWalk”也是可测量的。它是一种在线学习算法,它可以构建一个有用的渐进性,可以用来进行真实的升级和平行的分类。