Large-scale graph processing has drawn great attention in recent years. Most of the modern-day datacenter workloads can be represented in the form of Graph Processing such as MapReduce etc. Consequently, a lot of designs for Domain-Specific Accelerators have been proposed for Graph Processing. Spatial Architectures have been promising in the execution of Graph Processing, where the graph is partitioned into several nodes and each node works in parallel. We conduct experiments to analyze the on-chip movement of data in graph processing on a Spatial Architecture. Based on the observations, we identify a data movement bottleneck, in the execution of such highly parallel processing accelerators. To mitigate the bottleneck we propose a novel power-law aware Graph Partitioning and Data Mapping scheme to reduce the communication latency by minimizing the hop counts on a scalable network-on-chip. The experimental results on popular graph algorithms show that our implementation makes the execution 2-5x faster and 2.7-4x energy-efficient by reducing the data movement time in comparison to a baseline implementation.
翻译:大型图表处理近年来引起了极大的注意。大多数现代数据中心的工作量可以表现为地图处理等图表处理形式。因此,为图形处理提出了许多关于域特定加速器的设计。空间建筑在执行图处理方面很有希望,因为图将分解成几个节点,每个节点平行工作。我们进行实验,分析空间结构图处理中数据的在芯片移动。根据观察,我们发现数据移动瓶颈,执行这种高度平行的处理加速器。为了减轻瓶颈,我们提议采用新的电动法律了解图解析和数据绘图计划,通过将可缩放的网络在芯片上的跳数最小化来减少通信的耐久性。流行图形算法的实验结果显示,与基线执行相比,我们的实施通过减少数据流动时间,使2-5x执行速度更快,2.7-4x节能。