The Bulk Synchronous Parallel(BSP) computational model has emerged as the dominant distributed framework to build large-scale iterative graph processing systems. While its implementations(e.g., Pregel, Giraph, and Hama) achieve high scalability, frequent synchronization and communication among the workers can cause substantial parallel inefficiency. To help address this critical concern, this paper introduces the GraphHP(Graph Hybrid Processing) platform which inherits the friendly vertex-centric BSP programming interface and optimizes its synchronization and communication overhead. To achieve the goal, we first propose a hybrid execution model which differentiates between the computations within a graph partition and across the partitions, and decouples the computations within a partition from distributed synchronization and communication. By implementing the computations within a partition by pseudo-superstep iteration in memory, the hybrid execution model can effectively reduce synchronization and communication overhead while not requiring heavy scheduling overhead or graph-centric sequential algorithms. We then demonstrate how the hybrid execution model can be easily implemented within the BSP abstraction to preserve its simple programming interface. Finally, we evaluate our implementation of the GraphHP platform on classical BSP applications and show that it performs significantly better than the state-of-the-art BSP implementations. Our GraphHP implementation is based on Hama, but can easily generalize to other BSP platforms.
翻译:散装同步平行(BSP)计算模型已成为建立大型迭代图处理系统的主要分布框架。虽然其实施(例如Pregel、Giraph和Hama)实现了高度可缩放性,但工人之间经常同步和沟通可导致大量平行的低效率。为了帮助解决这一重大关切,本文件介绍了GreaphHP(GrapHP(Graph 混合处理)平台,该平台继承了友好的脊椎中心 BSP编程界面,并优化了其同步和通信管理。为了实现这一目标,我们首先提议了一个混合执行模型,该模型将图形分区和跨分区的计算区分开来,并将计算与分布的同步和通信分隔开来。通过在记忆中以假超超的重复方式进行分割进行计算,混合执行模型可以有效地减少同步和通信管理,同时不要求大量列表式的间接费用或以图形为中心的连续算法。然后我们演示如何在BSP的抽象版中轻易地实施该模型,以保存其简单的编程界面。最后,我们评估了在古典BSP应用程序上实施的GreaHP平台执行情况比其他的进度要好。