Embedding network data into a low-dimensional vector space has shown promising performance for many real-world applications, such as node classification and entity retrieval. However, most existing methods focused only on leveraging network structure. For social networks, besides the network structure, there also exists rich information about social actors, such as user profiles of friendship networks and textual content of citation networks. These rich attribute information of social actors reveal the homophily effect, exerting huge impacts on the formation of social networks. In this paper, we explore the rich evidence source of attributes in social networks to improve network embedding. We propose a generic Social Network Embedding framework (SNE), which learns representations for social actors (i.e., nodes) by preserving both the structural proximity and attribute proximity. While the structural proximity captures the global network structure, the attribute proximity accounts for the homophily effect. To justify our proposal, we conduct extensive experiments on four real-world social networks. Compared to the state-of-the-art network embedding approaches, SNE can learn more informative representations, achieving substantial gains on the tasks of link prediction and node classification. Specifically, SNE significantly outperforms node2vec with an 8.2% relative improvement on the link prediction task, and a 12.7% gain on the node classification task.
翻译:将网络数据嵌入一个低维矢量空间,对许多现实世界应用,如节点分类和实体检索等,显示有良好的业绩。然而,大多数现有方法仅侧重于利用网络结构。社会网络,除了网络结构外,还存在关于社会行为者的丰富信息,如友谊网络用户概况和引言网络的文字内容。这些丰富的社会行为者属性信息揭示了同质效应,对社会网络的形成产生了巨大影响。在本文中,我们探索了社交网络属性的丰富证据来源,以改进网络嵌入。我们提议了一个通用的社会网络嵌入框架(SNE),通过保持结构接近和属性接近,学习社会行为者(即节点)的表现形式。虽然结构接近性抓住了全球网络结构,但具有同质效应的属性接近性账户。为了证明我们的提议,我们对四个真实世界社会网络的形成进行了广泛的实验。与最新网络嵌入方法相比,SNE可以学习更多信息化的表述,在连接预测和节点预测任务方面取得重大成果,在12 % 相对任务分类上,SNE将一个显著的排序,在12 % 上没有显著的排序。具体地显示,SNEEEE=8xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx