For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock's overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries such as Ligra and Galois, and better performance than any other GPU high-level graph library.
翻译:对于GPU的大型图形分析器而言,数据访问和控制流程的不正常性,以及GPU程序的复杂性,对开发一个可编程的高性能图表图书馆提出了两项重大挑战。 “Gunrock”,我们专门为GPU设计的图形处理系统“Gunrock”,我们专门为GPU设计的图形处理系统“Gunrock”,使用一个高水平、大相同步、以数据为中心的抽象模型,侧重于在顶端或边缘边界上的操作。Gunrock通过将高性能GPU计算原始和优化战略与高水平编程序模型,使程序员能够迅速开发出新的图形原始,且代码小和微弱的GPUPU程序知识很少。我们描述各种优化战略的绩效,并评估Gunrock在不同GPU架构上的总体绩效,其范围很广,其范围很广,包括基于轨迹的算法和等级算法。结果显示,在单一的GPU、Grock上,其平均速度至少是比高级和电压级高水平的GPUPI和最高级图形图书馆更好的速度。