The SpMV kernel is characterized by high performance variation per input matrix and computing platform. While GPUs were considered State-of-the-Art for SpMV, with the emergence of advanced multicore CPUs and low-power FPGA accelerators, we need to revisit its performance and energy efficiency. This paper provides a high-level SpMV performance analysis based on structural features of matrices related to common bottlenecks of memory-bandwidth intensity, low ILP, load imbalance and memory latency overheads. Towards this, we create a wide artificial matrix dataset that spans these features and study the performance of different storage formats in nine modern HPC platforms; five CPUs, three GPUs and an FPGA. After validating our proposed methodology using real-world matrices, we analyze our extensive experimental results and draw key insights on the competitiveness of different target architectures for SpMV and the impact of each feature/bottleneck on its performance.
翻译:SpMV内核的特点是每个输入矩阵和计算平台的性能差异很大;虽然GPU被认为是对SpMV最先进的,但随着先进的多核心CPU和低功率的FPGA加速器的出现,我们需要重新审视其性能和能效。本文根据与记忆-带宽强度、低ILP、负载不平衡和记忆内存内存内存内存内滞量等常见瓶颈有关的矩阵结构特征,提供了高水平的SpMV性能分析。为此,我们创建了一个覆盖这些特征的大型人工矩阵数据集,并研究在9个现代HPC平台的不同储存格式的性能;5个CPU、3个GPU和1个FPGA的性能。在用现实世界矩阵验证了我们拟议的方法之后,我们分析了我们的广泛实验结果,并提出了关于SpMV的不同目标结构的竞争力以及每个特性/瓶颈对其性能的影响的关键见解。