We reduce the cost of communication and synchronization in graph processing by analyzing the fastest way to process graphs: pushing the updates to a shared state or pulling the updates to a private state.We investigate the applicability of this push-pull dichotomy to various algorithms and its impact on complexity, performance, and the amount of used locks, atomics, and reads/writes. We consider 11 graph algorithms, 3 programming models, 2 graph abstractions, and various families of graphs. The conducted analysis illustrates surprising differences between push and pull variants of different algorithms in performance, speed of convergence, and code complexity; the insights are backed up by performance data from hardware counters.We use these findings to illustrate which variant is faster for each algorithm and to develop generic strategies that enable even higher speedups. Our insights can be used to accelerate graph processing engines or libraries on both massively-parallel shared-memory machines as well as distributed-memory systems.
翻译:我们通过分析处理图表的最快方法来降低通信和同步处理图案的成本:将更新推到共同状态或将更新推到私人状态。 我们调查这种推拉分法对各种算法的适用性及其对复杂度、性能、用过的锁的数量、原子和读/写量的影响。 我们考虑11个图形算法、3个编程模型、2个图形抽象以及各种图表的种类。 所进行的分析显示了不同算法在性能、趋同速度和代码复杂性方面的推和拉变量之间的惊人差异; 洞见得到硬件计数器的性能数据的支持。 我们利用这些调查结果来说明每种算法的变异性更快, 并制订通用战略, 甚至能够加快速度。 我们的洞察力可以用来加速关于大规模平行共享的机器以及分布式模拟系统的图形处理引擎或图书馆。