Graph embedding techniques have attracted growing interest since they convert the graph data into continuous and low-dimensional space. Effective graph analytic provides users a deeper understanding of what is behind the data and thus can benefit a variety of machine learning tasks. With the current scale of real-world applications, most graph analytic methods suffer high computation and space costs. These methods and systems can process a network with thousands to a few million nodes. However, scaling to large-scale networks remains a challenge. The complexity of training graph embedding system requires the use of existing accelerators such as GPU. In this paper, we introduce a hybrid CPU-GPU framework that addresses the challenges of learning embedding of large-scale graphs. The performance of our method is compared qualitatively and quantitatively with the existing embedding systems on common benchmarks. We also show that our system can scale training to datasets with an order of magnitude greater than a single machine's total memory capacity. The effectiveness of the learned embedding is evaluated within multiple downstream applications. The experimental results indicate the effectiveness of the learned embedding in terms of performance and accuracy.
翻译:图形嵌入技术在将图形数据转换成连续和低维空间后引起了越来越多的兴趣。 有效的图形分析分析使用户更深刻地了解数据背后是什么,从而可以使各种机器学习任务受益。 由于当前现实世界应用的规模,大多数图形分析方法会受到高计算和空间成本的影响。 这些方法和系统可以处理一个有数千至几百万节点的网络。 然而,推广到大型网络仍是一个挑战。 培训图形嵌入系统的复杂性要求使用现有的加速器,如GPU。 在本文中,我们引入了一个混合的 CPU-GPU框架,解决学习嵌入大型图形的挑战。我们方法的性能在质量和数量上与共同基准上的现有嵌入系统相比较。 我们还表明,我们的系统可以将数据集的培训规模定得比一台机器的总记忆能力要大得多。 学到的嵌入效果在多个下游应用中评估。 实验结果显示,在业绩和准确性方面学到的嵌入效果。