大型图表的缓存效率快取叉处理模式 (Cache-Efficient Fork-Processing Patterns on Large Graphs)

As large graph processing emerges, we observe a costly fork-processing pattern (FPP) that is common in many graph algorithms. The unique feature of the FPP is that it launches many independent queries from different source vertices on the same graph. For example, an algorithm in analyzing the network community profile can execute Personalized PageRanks that start from tens of thousands of source vertices at the same time. We study the efficiency of handling FPPs in state-of-the-art graph processing systems on multi-core architectures. We find that those systems suffer from severe cache miss penalty because of the irregular and uncoordinated memory accesses in processing FPPs. In this paper, we propose ForkGraph, a cache-efficient FPP processing system on multi-core architectures. To improve the cache reuse, we divide the graph into partitions each sized of LLC capacity, and the queries in an FPP are buffered and executed on the partition basis. We further develop efficient intra- and inter-partition execution strategies for efficiency. For intra-partition processing, since the graph partition fits into LLC, we propose to execute each graph query with efficient sequential algorithms (in contrast with parallel algorithms in existing parallel graph processing systems) and present an atomic-free query processing by consolidating contending operations to cache-resident graph partition. For inter-partition processing, we propose yielding and priority-based scheduling, to reduce redundant work in processing. Besides, we theoretically prove that ForkGraph performs the same amount of work, to within a constant factor, as the fastest known sequential algorithms in FPP queries processing, which is work efficient. Our evaluations on real-world graphs show that ForkGraph significantly outperforms state-of-the-art graph processing systems with two orders of magnitude speedups.

翻译：随着大型图表处理的出现,我们观察到一种在许多图表算法中常见的昂贵的叉子处理模式(FPP)。FPP的独特特征是,它从不同的源顶端发出许多独立的查询。例如,分析网络社区剖面的算法可以执行个性化的PageRanks,从数万个源顶端开始;我们研究在多核心结构中最先进的图表处理系统中处理FPP的效率。我们发现这些系统由于在处理 FPPP过程中的不规则且不协调的内存访问而受到严重缓冲罚款。在这个文件中,我们建议FPP在多核心结构中采用一个缓冲式的FPP处理系统。为了改进缓冲和在多核心结构中执行FPPP处理系统。我们进一步开发高效的内部和部门间执行效率执行战略。对于内部处理来说,由于在处理 FPPPPP处理中采用不定期和不协调的内断断层访问访问权限,我们建议通过直径直径的平面处理流程进行一次直径直线操作,我们目前的直径直径直径直压操作, 以直径直径直径直方算直路路路路路的算算法进行。