We describe a simple parallel-friendly lightweight graph reordering algorithm for COO graphs (edge lists). Our ``Batched Order By Attachment'' (BOBA) algorithm is linear in the number of edges in terms of reads and linear in the number of vertices for writes through to main memory. It is highly parallelizable on GPUs\@. We show that, compared to a randomized baseline, the ordering produced gives improved locality of reference in sparse matrix-vector multiplication (SpMV) as well as other graph algorithms. Moreover, it can substantially speed up the conversion from a COO representation to the compressed format CSR, a very common workflow. Thus, it can give \emph{end-to-end} speedups even in SpMV\@. Unlike other lightweight approaches, this reordering does not rely on explicitly knowing the degrees of the vertices, and indeed its runtime is comparable to that of computing degrees. Instead, it uses the structure and edge distribution inherent in the input edge list, making it a candidate for default use in a pragmatic graph creation pipeline. This algorithm is suitable for road-type networks as well as scale-free. It improves cache locality on both CPUs and GPUs, achieving hit rates similar to the heavyweight techniques (e.g., for SpMV, 7--52\% and 11--67\% in the L1 and L2 caches, respectively). Compared to randomly labeled graphs, BOBA-reordered graphs achieve end-to-end speedups of up to 3.45. The reordering time is approximately one order of magnitude faster than existing lightweight techniques and up to 2.5 orders of magnitude faster than heavyweight techniques.
翻译:暂无翻译