In a general graph data structure like an adjacency matrix, when edges are homogeneous, the connectivity of two nodes can be sufficiently represented using a single bit. This insight has, however, not yet been adequately exploited by the existing matrix-centric graph processing frameworks. This work fills the void by systematically exploring the bit-level representation of graphs and the corresponding optimizations to the graph operations. It proposes a two-level representation named Bit-Block Compressed Sparse Row (B2SR) and presents a series of optimizations to the graph operations on B2SR by leveraging the intrinsics of modern GPUs. Evaluations on NVIDIA Pascal and Volta GPUs show that the optimizations bring up to $40\times$ and $6555\times$ for essential GraphBLAS kernels SpMV and SpGEMM, respectively, making GraphBLAS-based BFS accelerate up to $433\times$, SSSP, PR, and CC up to $35\times$, and TC up to $52\times$.
翻译:在一般图表数据结构中,如相邻矩阵,当边缘平整时,两个节点的连通性可以用一个位数来充分代表。但是,现有的矩阵中心图形处理框架尚未充分利用这一洞察力。这项工作通过系统地探索图形的位值表示法和对图形操作的相应优化填补了空白。它建议采用一个名为Bit-Block 压缩缩略图(B2SR)的两级代表法,并通过利用现代GPU的内在要素,对B2SR的图形操作进行一系列优化。 NVIDIA Pascal和Volta GPUs的评估显示,对GraphBAS 内核流和SpGEMM的优化分别带来40美元和6555美元,使基于GregBLAS BFS的BFS加速到433美元的时间值,SSSP、PR和CC最高35美元,以及TC最高为52美元。