A widely established set of unsupervised node embedding methods can be interpreted as consisting of two distinctive steps: i) the definition of a similarity matrix based on the graph of interest followed by ii) an explicit or implicit factorization of such matrix. Inspired by this viewpoint, we propose improvements in both steps of the framework. On the one hand, we propose to encode node similarities based on the free energy distance, which interpolates between the shortest path and the commute time distances, thus, providing an additional degree of flexibility. On the other hand, we propose a matrix factorization method based on a loss function that generalizes that of the skip-gram model with negative sampling to arbitrary similarity matrices. Compared with factorizations based on the widely used $\ell_2$ loss, the proposed method can better preserve node pairs associated with higher similarity scores. Moreover, it can be easily implemented using advanced automatic differentiation toolkits and computed efficiently by leveraging GPU resources. Node clustering, node classification, and link prediction experiments on real-world datasets demonstrate the effectiveness of incorporating free-energy-based similarities as well as the proposed matrix factorization compared with state-of-the-art alternatives.
翻译:广泛建立的一套不受监督的节点嵌入方法可被解释为包括两个截然不同的步骤:一)根据兴趣图表确定相似矩阵的定义,然后是;二)这种矩阵的明示或隐含因子化。受这一观点的启发,我们建议改进框架的两个步骤。一方面,我们提议以自由能源距离为基础将结点的相似性编码,在最短路径和通勤时间距离之间进行相互交错,从而提供额外程度的灵活性。另一方面,我们提议一种矩阵因子化方法,其依据的是一种损失函数,将跳格模型的反抽样概括为任意相似矩阵。与基于广泛使用的 $\ell_2美元损失的因子化相比,拟议的方法可以更好地保存与较高相似分数相关的结点。此外,可以很容易地使用先进的自动区分工具包,通过利用GPU资源进行高效率的计算。节点组合、节点分类和将现实世界数据集的预测实验与拟议的矩阵化系数相比较,表明将基于自由能源的相似性近点与拟议的矩阵系数结合的有效性。