通过利用部分对立结构加速标准CPU上的SpMV内核 (Accelerating the SpMV kernel on standard CPUs by exploiting the partially diagonal structures)

Sparse Matrix Vector multiplication (SpMV) is one of basic building blocks in scientific computing, and acceleration of SpMV has been continuously required. In this research, we aim for accelerating SpMV on recent CPUs for sparse matrices that have a specific sparsity structure, namely a diagonally structured sparsity pattern. We focus a hybrid storage format that combines the DIA and CSR formats, so-called the HDC format. First, we recall the importance of introducing cache blocking techniques into HDC-based SpMV kernels. Next, based on the observation of the cache blocked kernel, we present a modified version of the HDC formats, which we call the M-HDC format, in which partial diagonal structures are expected to be more efficiently picked up. For these SpMV kernels, we theoretically analyze the expected performance improvement based on performance models. Then, we conduct comprehensive experiments on state-of-the-art multi-core CPUs. By the experiments using typical matrices, we clarify the detailed performance characteristics of each SpMV kernel. We also evaluate the performance for matrices appearing in practical applications and demonstrate that our approach can accelerate SpMV for some of them. Through the present paper, we demonstrate the effectiveness of exploiting partial diagonal structures by the M-HDC format as a promising approach to accelerating SpMV on CPUs for a certain kind of practical sparse matrices.

翻译：剖析矩阵矢量乘数(SpMV)是科学计算的基本构件之一,不断需要SpMV加速。在这项研究中,我们的目标是加速对最近具有特定宽度结构的稀薄基质(即对数结构结构宽度模式)的CPU的监听器的监听器的监听器。我们集中使用混合储存格式,将DIA和CSR格式(即所谓的HDC格式)结合起来。首先,我们回顾将缓存阻塞技术引入以HDC为基础的SpMV内核内核的重要性。接着,根据缓存内核的观察,我们提出了HDDC格式的修改版本,我们称之为M-HDC格式,其中部分对角结构有望被更高效地接收。对于SpMV内核部分结构,我们从理论上分析预期的绩效改进。然后,我们用典型的基质矩阵实验,我们澄清了每个SpMV内核内核的详尽性能特性。我们还评估了MV格式在实际的MV内核应用中的某些深度基质上的表现,我们通过SV的深度基质模型展示了某种格式,以加速利用SVDMVC的方式展示其格式。