The subgraph-centric programming model is a promising approach and has been applied in many state-of-the-art distributed graph computing frameworks. However, traditional graph partition algorithms have significant difficulties in processing large-scale power-law graphs. The major problem is the communication bottleneck found in many subgraph-centric frameworks. Detailed analysis indicates that the communication bottleneck is caused by the huge communication volume or the extreme message imbalance among partitioned subgraphs. The traditional partition algorithms do not consider both factors at the same time, especially on power-law graphs. In this paper, we propose a novel efficient and balanced vertex-cut graph partition algorithm (EBV) which grants appropriate weights to the overall communication cost and communication balance. We observe that the number of replicated vertices and the balance of edge and vertex assignment have a great influence on communication patterns of distributed subgraph-centric frameworks, which further affect the overall performance. Based on this insight, We design an evaluation function that quantifies the proportion of replicated vertices and the balance of edges and vertices assignments as important parameters. Besides, we sort the order of edge processing by the sum of end-vertices' degrees from small to large. Experiments show that EBV reduces replication factor and communication by at least 21.8% and 23.7% respectively than other self-based partition algorithms. When deployed in the subgraph-centric framework, it reduces the running time on power-law graphs by an average of 16.8% compared with the state-of-the-art partition algorithm. Our results indicate that EBV has a great potential in improving the performance of subgraph-centric frameworks for the parallel large-scale power-law graph processing.
翻译:以子绘图为中心的编程模型是一种有希望的方法,并且已经在许多最先进的分布式图表计算框架中应用了这种模式。然而,传统的图形分区算法在处理大型电法图表方面有相当大的困难。主要问题在于在许多子图中心框架中发现的通信瓶颈。详细分析表明,通信瓶颈是由分布式子图之间的通信量巨大或极端信息不平衡造成的。传统的分区算法没有同时考虑这两个因素,特别是在电动法图中。在本文中,我们提出一个新的高效和平衡的顶向偏向图形分区分配算法(EBV),它给整个通信成本和通信平衡带来适当的权重。我们观察到,复制的脊椎数量以及边缘和顶端之间的平衡对分布式子图框架的通信模式有很大影响,从而进一步影响总体性能。基于这一洞察,我们设计一个评估功能,用来量化复制的螺旋比例以及边向和顶向偏向和顶向偏向的图形分配过程,作为重要的运行参数。此外,我们用高偏向偏向的偏向值比重的递增率,以21度递增度递增度的顺序的顺序,从而显示它的平均偏向方向的顺序。在21度的顺序中,以最大的递减了其它方向的顺序的顺序,以21度的顺序的顺序的递减了它。