Recent studies showed that single-machine graph processing systems can be as highly competitive as cluster-based approaches on large-scale problems. While several out-of-core graph processing systems and computation models have been proposed, the high disk I/O overhead could significantly reduce performance in many practical cases. In this paper, we propose GraphMP to tackle big graph analytics on a single machine. GraphMP achieves low disk I/O overhead with three techniques. First, we design a vertex-centric sliding window (VSW) computation model to avoid reading and writing vertices on disk. Second, we propose a selective scheduling method to skip loading and processing unnecessary edge shards on disk. Third, we use a compressed edge cache mechanism to fully utilize the available memory of a machine to reduce the amount of disk accesses for edges. Extensive evaluations have shown that GraphMP could outperform state-of-the-art systems such as GraphChi, X-Stream and GridGraph by 31.6x, 54.5x and 23.1x respectively, when running popular graph applications on a billion-vertex graph.
翻译:最近的研究显示,单机图解处理系统与大型问题集束处理方法一样具有高度竞争力。虽然已经提出了几个核心图解处理系统和计算模型,但高磁盘I/O间接费用在许多实际情况下会大大降低性能。在本文件中,我们提议GifaMP处理一台机器的大型图解分析器。GifaMP用三种技术实现了低磁盘I/O间接费用。首先,我们设计了一个以脊椎为中心的滑动窗口计算模型,以避免在磁盘上读写头。第二,我们提议了一种有选择的时间安排方法来跳过磁盘上和处理不必要的边缘碎片。第三,我们使用压缩边缘缓存机制来充分利用机器的现有记忆来减少边缘的磁盘存量。广泛的评估表明,在对10亿个垂直图示图进行流行图形应用时,Greaph、X-Stream和GridGraph等最先进的系统将分别超过31.6x、54.5x和23.1x。