Graph embedding, representing local and global neighborhood information by numerical vectors, is a crucial part of the mathematical modeling of a wide range of real-world systems. Among the embedding algorithms, random walk-based algorithms have proven to be very successful. These algorithms collect information by creating numerous random walks with a redefined number of steps. Creating random walks is the most demanding part of the embedding process. The computation demand increases with the size of the network. Moreover, for real-world networks, considering all nodes on the same footing, the abundance of low-degree nodes creates an imbalanced data problem. In this work, a computationally less intensive and node connectivity aware uniform sampling method is proposed. In the proposed method, the number of random walks is created proportionally with the degree of the node. The advantages of the proposed algorithm become more enhanced when the algorithm is applied to large graphs. A comparative study by using two networks namely CORA and CiteSeer is presented. Comparing with the fixed number of walks case, the proposed method requires 50% less computational effort to reach the same accuracy for node classification and link prediction calculations.
翻译:图形嵌入,通过数字矢量代表本地和全球周边信息,是一系列真实世界系统数学模型的关键部分。 在嵌入式算法中,随机步行算法证明非常成功。 这些算法通过创建众多随机行走和若干步骤重新定义来收集信息。 创建随机行走是嵌入过程最需要的部分。 计算需求随着网络的大小而增加。 此外, 对于真实世界网络来说,考虑到同一基础上的所有节点, 大量低度节点造成了数据不平衡的问题。 在这项工作中, 提出了一个计算强度较小且节点连接意识统一抽样方法。 在拟议方法中, 随机行走的数量与节点的程度成比例成正比。 当将算法应用到大图表时, 提议的算法的优点会变得更强。 使用两个网络( CORA 和 CiteSeer) 进行比较研究 。 与固定行道数量相比, 拟议的方法需要减少50%的计算努力, 以达到相同的节点分类和链接预测的精确度计算。