Iterative graph algorithms often compute intermediate values and update them as computation progresses. Updated output values are used as inputs for computations in current or subsequent iterations; hence the number of iterations required for values to converge can potentially reduce if the newest values are asynchronously made available to other updates computed in the same iteration. In a multi-threaded shared memory system, the immediate propagation of updated values can cause memory contention that may offset the benefit of propagating updates sooner. In some cases, the benefit of a smaller number of iterations may be diminished by each iteration taking longer. Our key idea is to combine the low memory contention that synchronous approaches have with the faster information sharing of asynchronous approaches. Our hybrid approach buffers updates from threads locally before committing them to the global store to control how often threads may cause conflicts for others while still sharing data within one iteration and hence speeding convergence. On a 112-thread CPU system, our hybrid approach attains up to 4.5% - 19.4% speedup over an asynchronous approach for Pagerank and up to 1.9% - 17% speedup over asynchronous Bellman Ford SSSP. Further, our hybrid approach attains 2.56x better performance than the synchronous approach. Finally, we provide insights as to why delaying updates is not helpful on certain graphs where connectivity is clustered on the main diagonal of the adjacency matrix.
翻译:迭代图形算法通常计算中间值, 并随着计算进度更新。 更新输出值被用作当前或以后迭代中计算输入的输入值; 因此, 如果新值与在同一迭代中计算的其他更新相匹配时, 聚集值所需的迭代数可能会减少。 在多读共享的记忆系统中, 立即传播更新值可能会引发记忆争论, 从而可能更快地抵消传播更新更新的好处 。 在某些情况下, 将每次迭代的时间推得更长, 就会减少较少的迭代值的效益。 我们的关键想法是将同步方法与较快的不同步方法共享信息的低内存争论结合起来。 我们的混合方法会缓冲本地线索, 然后再将其发送到全球存储处, 从而控制经常的线性会给其他人造成冲突, 同时在一次迭代中共享数据, 从而加速趋同。 在112次读的 CPU 系统中, 我们的混合矩阵方法会达到4.5 % - 19.4 % 加速速度, 而不是同步方法。 在平时, Forrank 和 SP 达到1. 17 的同步方法, 更接近 。