Recent advances in data processing have stimulated the demand for learning graphs of very large scales. Graph Neural Networks (GNNs), being an emerging and powerful approach in solving graph learning tasks, are known to be difficult to scale up. Most scalable models apply node-based techniques in simplifying the expensive graph message-passing propagation procedure of GNN. However, we find such acceleration insufficient when applied to million- or even billion-scale graphs. In this work, we propose SCARA, a scalable GNN with feature-oriented optimization for graph computation. SCARA efficiently computes graph embedding from node features, and further selects and reuses feature computation results to reduce overhead. Theoretical analysis indicates that our model achieves sub-linear time complexity with a guaranteed precision in propagation process as well as GNN training and inference. We conduct extensive experiments on various datasets to evaluate the efficacy and efficiency of SCARA. Performance comparison with baselines shows that SCARA can reach up to 100x graph propagation acceleration than current state-of-the-art methods with fast convergence and comparable accuracy. Most notably, it is efficient to process precomputation on the largest available billion-scale GNN dataset Papers100M (111M nodes, 1.6B edges) in 100 seconds.
翻译:最近的数据处理进步刺激了对大规模学习图表的需求。 图表神经网络(GNN)是解决图表学习任务的新兴和强大方法,已知很难扩大规模。 大多数可扩缩模型应用基于节点的技术简化GNN昂贵的图形信息传递程序。 然而,我们发现,在对百万甚至10亿尺度的图表应用时,这种加速不够。 在这项工作中,我们提议SCARA(一个可缩放的GNN),这是一个可缩放的GNN,为图形计算提供以地貌为导向的优化。 SCARA高效地从节点特性中嵌入图形,并进一步选择和再利用特征计算结果,以减少间接费用。理论分析表明,我们的模型实现了亚线性时间复杂性,保证了传播过程的精确性以及GNNN培训和推断。 我们在各种数据集上进行广泛的实验,以评价SCARA的功效和效率。 与基线的绩效比较表明,SCARA(SARA)可达到100x图形传播速度和可比较的当前状态方法的加速度,并进一步选择和再利用地计算结果。 最显著的是,在G111110亿码的GM级前期的论文是有效的。