Sparse matrix-vector multiplication (SpMV) is one of the most important kernels in high-performance computing (HPC), yet SpMV normally suffers from ill performance on many devices. Due to ill performance, SpMV normally requires special care to store and tune for a given device. Moreover, HPC is facing heterogeneous hardware containing multiple different compute units, e.g., many-core CPUs and GPUs. Therefore, an emerging goal has been to produce heterogeneous formats and methods that allow critical kernels, e.g., SpMV, to be executed on different devices with portable performance and minimal changes to format and method. This paper presents a heterogeneous format based on CSR, named CSR-k, that can be tuned quickly and outperforms the average performance of Intel MKL on Intel Xeon Platinum 8380 and AMD Epyc 7742 CPUs while still outperforming NVIDIA's cuSPARSE and Sandia National Laboratories' KokkosKernels on NVIDIA A100 and V100 for regular sparse matrices, i.e., sparse matrices where the number of nonzeros per row has a variance $\leq$ 10, such as those commonly generated from two and three-dimensional finite difference and element problems. In particular, CSR-k achieves this with reordering and by grouping rows into a hierarchical structure of super-rows and super-super-rows that are represented by just a few extra arrays of pointers. Due to its simplicity, a model can be tuned for a device and used to select super-row and super-super-rows sizes in constant time.
翻译:分解矩阵- 矢量乘法( SpMV) 是高性能计算( HPC) 中最重要的内核之一, 然而 SpMV 通常在许多设备上表现不良。 由于性能不佳, SpMV 通常需要特别小心保存和调制给定设备。 此外, HPC 面临着包含多种不同计算单位的多种硬件, 例如, 许多核心 CPU 和 GPU 。 因此, 正在出现的目标是生成不同格式和方法, 使关键内核( 例如, SpMV ) 能够在具有便携式性能且格式和方法变化最小化的不同设备上执行。 由于性能不佳, SpMVMV通常需要特别小心地存储和调制调一个设备。 在 Intel Xeon Platinum 8380 和 AMD Epyc 7742 CPUs 上, 仍然比 NVIDIA 的 CSPARS 和 Sandia National Arelatories 的内核阵列, 和 VCKNVKNVKNA 100 和 V100 的直位结构中, 通常的内, 和惯性变变变变变的内, 和变的内, 等的内, 的内, 和内, 等的内, 和内变式的内, 等的内, 等的内, 和内基质的内基质的内, 等的内数, 等的内值的内值和内基, 等的内值的内值的内基, 的内值, 直序号, 直径, 直径, 等的内, 直径, 直径, 直径, 等的内, 等的内, 等的内, 等的内, 等的内, 等的内, 等的内, 等的内, 等的内, 等的内, 等的内, 等的内, 等的内, 等的内, 等的内, 等的内的内, 等的内, 等, 等的内的内, 等, 等的内的内, 等的内, 等的内, 直