Sparse matrix-vector multiplication (SpMV) multiplies a sparse matrix with a dense vector. SpMV plays a crucial role in many applications, from graph analytics to deep learning. The random memory accesses of the sparse matrix make accelerator design challenging. However, high bandwidth memory (HBM) based FPGAs are a good fit for designing accelerators for SpMV. In this paper, we present Serpens, an HBM based accelerator for general-purpose SpMV.Serpens features (1) a general-purpose design, (2) memory-centric processing engines, and (3) index coalescing to support the efficient processing of arbitrary SpMVs. From the evaluation of twelve large-size matrices, Serpens is 1.91x and 1.76x better in terms of geomean throughput than the latest accelerators GraphLiLy and Sextans, respectively. We also evaluate 2,519 SuiteSparse matrices, and Serpens achieves 2.10x higher throughput than a K80 GPU. For the energy/bandwidth efficiency, Serpens is 1.71x/1.99x, 1.90x/2.69x, and 6.25x/4.06x better compared with GraphLily, Sextans, and K80, respectively. After scaling up to 24 HBM channels, Serpens achieves up to 60.55~GFLOP/s (30,204~MTEPS) and up to 3.79x over GraphLily. The code is available at https://github.com/UCLA-VAST/Serpens.
翻译:SpMV在许多应用中发挥着关键作用,从图形分析到深层学习。 随机访问稀有矩阵的内存使加速器设计具有挑战性。 然而,基于高频内存(HBM)的FPGA(FPGA)对于SpMV的加速器设计来说是一个很好的。 在本文中,我们介绍Serpens,一个基于 HB 的SpMV60加速器。Sercomys的特性(1) 一个通用设计、(2) 内存中心处理引擎和(3) 指数煤化,以支持任意的SmmVs的高效处理。从12个大型矩阵的评价来看,Serpens在地理比例方面是1.91x和1.76x,比最新的加速器图LiLy和Sextan分别更适合。 我们还对2 519 SuiteSparse small rouples mission 和Serpensionalx 2.10x比K80GPUPO、(2)-road-road-roadLx 和Serreal-69/bx levelyL06和Serreals) 和Syal-69x 和Serplexx 0.69x 和Serplexxxx 。