With the magnitude of graph-structured data continually increasing, graph processing systems that can scale-out and scale-up are needed to handle extreme-scale datasets. While existing distributed out-of-core solutions have made it possible, they suffer from limited performance due to excessive I/O and communication costs. We present DFOGraph, a distributed fully-out-of-core graph processing system that applies and assembles multiple techniques to enable I/O- and communication-efficient processing. DFOGraph builds upon two-level column-oriented partition with adaptive compressed representations to allow fine-grained selective computation and communication, and it only issues necessary disk and network requests. Our evaluation shows DFOGraph achieves performance comparable to GridGraph and FlashGraph (>2.52x and 1.06x) on a single machine and outperforms Chaos and HybridGraph significantly (>12.94x and >10.82x) when scaling out.
翻译:随着图表结构数据的规模不断增加,处理极端规模数据集需要能够扩大和扩大的图表处理系统。虽然现有的分布式核心外解决方案使它成为可能,但由于I/O和通信费用过高,它们的业绩有限。我们介绍了分布式全无核心图形处理系统DFOGraph,该系统应用并集成多种技术,使I/O和通信效率处理成为可能。DFOGraph以两层面向专栏的分区为基础,并具有适应性压缩表示法,以便进行细微的选择性计算和通信,它只发布必要的磁盘和网络请求。我们的评估显示,DFOGraph在扩大规模时在一台单一机器上取得与GridGraph和FlashGraph(>2.52x和1.06x)相似的性能,并在一台机器上大大超越Chaos和Compligraph(>12.94x和>10.82x)的性能。