Distributed systems that manage and process graph-structured data internally solve a graph partitioning problem to minimize their communication overhead and query run-time. Besides computational complexity -- optimal graph partitioning is NP-hard -- another important consideration is the memory overhead. Real-world graphs often have an immense size, such that loading the complete graph into memory for partitioning is not economical or feasible. Currently, the common approach to reduce memory overhead is to rely on streaming partitioning algorithms. While the latest streaming algorithms lead to reasonable partitioning quality on some graphs, they are still not completely competitive to in-memory partitioners. In this paper, we propose a new system, Hybrid Edge Partitioner (HEP), that can partition graphs that fit partly into memory while yielding a high partitioning quality. HEP can flexibly adapt its memory overhead by separating the edge set of the graph into two sub-sets. One sub-set is partitioned by NE++, a novel, efficient in-memory algorithm, while the other sub-set is partitioned by a streaming approach. Our evaluations on large real-world graphs show that in many cases, HEP outperforms both in-memory partitioning and streaming partitioning at the same time. Hence, HEP is an attractive alternative to existing solutions that cannot fine-tune their memory overheads. Finally, we show that using HEP, we achieve a significant speedup of distributed graph processing jobs on Spark/GraphX compared to state-of-the-art partitioning algorithms.
翻译:管理和处理图形结构化数据的分布式系统在内部解决图形分割问题,以尽量减少其通信管理费和查询运行时间。除了计算复杂性外(最佳图形分割法是NP-hard),另一个重要考虑因素是内存管理费。现实世界图形通常具有巨大的大小,因此将完整的图形装入内存以进行分区处理是不经济的或可行的。目前,减少内存管理费的常用方法是依靠流式分区算法。虽然最新的流式算法导致某些图表上合理的分割质量,但它们对于模拟分割器来说仍然不完全具有竞争力。在这个文件中,我们提议了一个新的系统,即混合偏离分割器(HEP),这个系统可以部分适应内存,同时产生高分隔质量。新世界图形可以将完整的图形装入内存以内存,这样可以灵活地调整内存管理费。一个子集由NE++进行分解,一种新颖的、高效的内置算法,而其他子集则通过流式分割法进行内置式分割。我们在大地平时平间平间平时的轨上,对Hl- 平流流流流式分析,在最后的流流流流中,我们无法显示Hl- 的内平流流流式平流流式的内平流式平流式平流式平流式平流式平流式平流式平时,在H- 的内平流式平流式平流式平流式平流式平流式平。