Recent advances in the field of network embedding have shown the low-dimensional network representation is playing a critical role in network analysis. However, most of the existing principles of network embedding do not incorporate auxiliary information such as content and labels of nodes flexibly. In this paper, we take a matrix factorization perspective of network embedding, and incorporate structure, content and label information of the network simultaneously. For structure, we validate that the matrix we construct preserves high-order proximities of the network. Label information can be further integrated into the matrix via the process of random walk sampling to enhance the quality of embedding in an unsupervised manner, i.e., without leveraging downstream classifiers. In addition, we generalize the Skip-Gram Negative Sampling model to integrate the content of the network in a matrix factorization framework. As a consequence, network embedding can be learned in a unified framework integrating network structure and node content as well as label information simultaneously. We demonstrate the efficacy of the proposed model with the tasks of semi-supervised node classification and link prediction on a variety of real-world benchmark network datasets.
翻译:网络嵌入领域最近的进展表明,低维网络代表面在网络分析中发挥着关键作用,但是,现有的网络嵌入原则大多没有灵活地纳入节点的内容和标签等辅助信息。在本文中,我们从网络嵌入的矩阵因素角度,同时纳入网络的结构、内容和标签信息。关于结构,我们确认,我们构建的矩阵保存了网络的高阶近似性。标签信息可以通过随机步行抽样程序进一步纳入矩阵,以提高以不受监督的方式嵌入的质量,即不利用下游分类器。此外,我们推广了跳过-格拉姆负抽样模型,将网络内容纳入矩阵要素化框架。因此,网络嵌入可以在一个统一的框架内学习,将网络结构、节点内容和标签信息结合起来。我们展示了拟议模型与半超前节点分类任务的效力,并将各种真实世界基准网络数据集的预测联系起来。