Sparse matrix-vector multiplication (SpMV) is one of the most important kernels in high-performance computing (HPC), yet SpMV normally suffers from ill performance on many devices. Due to ill performance, sparse matrix-vector multiplication (SpMV) normally requires special care to store and tune for a given device. This required storage formats and tunings that allow for efficient SpMV operations with low memory and low tuning overheads across heterogeneous devices. Additionally, the primary users of SpMV operations in HPC are normally application scientists that already have numerous other libraries they depend on the use of some standard sparse matrix storage format. As such, the ideal heterogeneous format would also be something that could easily be understood and requires no major changes to be translated into a standard sparse matrix format, such as compressed sparse row (CSR). This paper presents a heterogeneous format based on CSR, named CSR-k, that can be tuned quickly, requires minimal memory overheads, outperforms the average performance of NVIDIA's cuSPARSE and Sandia National Laboratories' KokkosKernels, while being on par with Intel MKL on our test suite. Additionally, CSR-k does not need any conversion to be used by standard library calls that require a CSR format input. In particular, CSR-k achieves this by grouping rows into a hierarchical structure of super-rows and super-super-rows that are represented by just a few extra arrays of pointers. Due to its simplicity, a model can be tuned for a device, and this model can be used to select super-row and super-super-rows sizes in constant time. We observe in this paper that CSR-k can achieve about 17.3% improvement on an NVIDIA V100 and about 18.9% improvement on an NVIDIA A100 over NVIDIA's cuSPARSE while still performing on-par with Intel MKL on an Intel Xeon Platinum 8380 and an AMD Epyc 7742.
翻译:Spmiss- Vector 倍增( Spmov) 是高性能计算( HPC) 中最重要的内核核心之一, 然而 SpMV 通常在许多设备上表现不良。 由于性能不佳, 稀疏的矩阵- Vector 倍增( Spmov) 通常需要特殊小心地存储和调制给给定设备。 这需要存储格式和调试, 以便以低内存和低调的方式在各种设备中高效地运行 Smmva 操作。 此外, HPC 的 SpMV 操作的主要用户通常是应用科学家, 他们已经依靠使用一些标准的分散式存储格式( HPC), 而 Spmlmmmd通常也会有其他很多的图书馆。 因此, 理想的简化格式也会很容易被理解, 并且不需要重大更改为标准式的 分散式矩阵格式, 例如压缩的 CSR, 在使用过的 CSR 中, 也可以使用任何超级的 CK- 格式, 也可以使用 CK- 来测试 CK- 。