大型图表的缓存效率快取叉处理模式 (Cache-Efficient Fork-Processing Patterns on Large Graphs)

As large graph processing emerges, we observe a costly fork-processing pattern (FPP) common in many graph algorithms. The unique feature of the FPP is that it launches many independent queries from different source vertices on the same graph. For example, an algorithm in analyzing the network community profile can execute Personalized PageRanks that start from tens of thousands of source vertices at the same time. We study the efficiency of state-of-the-art graph processing systems on multi-core architectures, including Ligra, Gemini, and GraphIt. We find that those systems suffer from severe cache miss penalty because of the irregular and uncoordinated memory accesses in processing FPPs. In this paper, we propose ForkGraph, a cache-efficient FPP processing system on multi-core architectures. To improve the cache reuse, we divide the graph into partitions each sized of LLC capacity, and the queries in an FPP are buffered and executed on the partition basis. We further develop efficient intra- and inter-partition execution strategies for efficiency. For intra-partition processing, since the graph partition fits into LLC, we propose to execute each graph query with efficient sequential algorithms (in contrast with parallel algorithms in existing parallel graph processing systems) and present an atomic-free query processing by consolidating contending operations to cache-resident graph partition. For inter-partition processing, we propose yielding and priority-based scheduling, to reduce redundant work in processing. Besides, we theoretically prove that ForkGraph performs the same amount of work, to within a constant factor, as the fastest known sequential algorithms in FPP queries processing, which is work efficient. Our evaluations on real-world graphs show that ForkGraph significantly outperforms state-of-the-art graph processing systems with two orders of magnitude speedups.

翻译：随着大图表处理的出现,我们观察到许多图表算法中常见的昂贵的叉子处理模式(FPP)。 FPP的独特特征是,它从不同的源头顶部发出许多独立的查询。例如,分析网络社区配置的算法可以执行个性化的PageRanks,从数万个源头头开始。我们研究的是包括利格拉、基米和GapIT在内的多核心结构中最先进的图表处理系统的效率。我们发现这些系统由于处理FPPP的不规则且不协调的内存访问而受到严重缓冲罚款。在本文中,我们提议FPPP在多核心结构中采用一个缓冲式的页码处理系统,而FPPP的查询则是在隔断的基础上进行缓冲和进行相同的查询。我们进一步开发高效的内部和部门间执行效率执行战略。对于内部处理来说,由于在处理FPLPO的平价处理过程中,我们建议FLP的平面平面平面分析平面平面平面的平面平流处理流程进行大幅的连续的平面平流分析。