Sparse general matrix multiplication (SpGEMM) is a fundamental building block for many real-world applications. Since SpGEMM is a well-known memory-bounded application with vast and irregular memory accesses, considering the memory access efficiency is of critical importance for optimizing SpGEMM. Yet, the existing methods put less consideration into the memory subsystem and achieved suboptimal performance. In this paper, we thoroughly analyze the memory access patterns of SpGEMM and their influences on the memory subsystem. Based on the analysis, we propose a novel and more efficient accumulation method named BRMerge for the multi-core CPU architectures. The BRMerge accumulation method follows the row-wise dataflow. It first accesses the $B$ matrix, generates the intermediate lists for one output row, and stores these intermediate lists in a consecutive memory space, which is implemented by a ping-pong buffer. It then immediately merges these intermediate lists generated in the previous phase two by two in a tree-like hierarchy between two ping-pong buffers. The architectural benefits of BRMerge are 1) streaming access patterns, 2) minimized TLB cache misses, and 3) reasonably high L1/L2 cache hit rates, which result in both low access latency and high bandwidth utilization when performing SpGEMM. Based on the BRMerge accumulation method, we propose two SpGEMM libraries named BRMerge-Upper and BRMerge-Precise, which use different allocation methods. Performance evaluations with 26 commonly used benchmarks on two CPU servers show that the proposed SpGEMM libraries significantly outperform the state-of-the-art SpGEMM libraries.
翻译:由于SpGEMM是一个广为人知的内存式应用程序,其内存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存量效率对于优化 SpGEM 存存取总基( SpGEMM ) 。 然而, 现有方法对存储子系统考虑较少, 并取得了最优性能。 在本文中, 我们彻底分析了 SpGEMM 的内存存存存存模式及其对记忆存存存系子系统的影响。 根据分析, 我们提出了一个名为BRMEGER的新型和更有效的累积方法。 BRMER的建筑效益是:1 流存取模式, 2 将TRBERGERE 缓存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存, 2, 2, 2 中存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存取存取存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存取存存存存存存存存存存存存取存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存存