Processing large graphs with memory-limited GPU needs to resolve issues of host-GPU data transfer, which is a key performance bottleneck. Existing GPU-accelerated graph processing frameworks reduce the data transfers by managing the active subgraph transfer at runtime. Some frameworks adopt explicit transfer management approaches based on explicit memory copy with filter or compaction. In contrast, others adopt implicit transfer management approaches based on on-demand access with zero-copy or unified-memory. Having made intensive analysis, we find that as the active vertices evolve, the performance of the two approaches varies in different workloads. Due to heavy redundant data transfers, high CPU compaction overhead, or low bandwidth utilization, adopting a single approach often results in suboptimal performance. In this work, we propose a hybrid transfer management approach to take the merits of both the two approaches at runtime, with an objective to achieve the shortest execution time in each iteration. Based on the hybrid approach, we present HytGraph, a GPU-accelerated graph processing framework, which is empowered by a set of effective task scheduling optimizations to improve the performance. Our experimental results on real-world and synthesized graphs demonstrate that HyTGraph achieves up to 10.27X speedup over existing GPU-accelerated graph processing systems including Grus, Subway, and EMOGI.
翻译:以内存限制的 GPU 处理大图表需要解决主机- GPU 数据传输问题,这是一个关键的性能瓶颈。现有的 GPU 加速的图形处理框架通过在运行时管理主动子子传输而减少数据传输。有些框架采用基于过滤或压缩的直线存储副本的明确传输管理办法。而另一些框架则采用基于按需访问的零复制或统一模版的隐含传输管理办法。在进行密集分析后,我们发现,随着主动的顶端的演变,两种方法的性能在不同工作量中有所不同。由于大量重复数据传输、高CPU压缩管理或低带宽利用率,采用单一方法往往导致不优化的绩效。在这项工作中,我们建议采用一种混合的传输管理办法,以在运行时两种方法的优点为基础,目标是在每次循环中实现最短的执行时间。根据混合方法,我们介绍了HytGraph, 一种GPU- 加速的图形处理框架,通过一套有效的任务时间安排来增强这种框架的功能传输能力,包括快速的GUMO 合成系统,我们实验性地展示了现有的GUX 。