Sparse matrix-vector multiplication (SpMV) multiplies a sparse matrix with a dense vector. SpMV plays a crucial role in many applications, from graph analytics to deep learning. The random memory accesses of the sparse matrix make accelerator design challenging. However, high bandwidth memory (HBM) based FPGAs are a good fit for designing accelerators for SpMV. In this paper, we present Serpens, an HBM based accelerator for general-purpose SpMV.Serpens features (1) a general-purpose design, (2) memory-centric processing engines, and (3) index coalescing to support the efficient processing of arbitrary SpMVs. From the evaluation of twelve large-size matrices, Serpens is 1.91x and 1.76x better in terms of geomean throughput than the latest accelerators GraphLiLy and Sextans, respectively. We also evaluate 2,519 SuiteSparse matrices, and Serpens achieves 2.10x higher throughput than a K80 GPU. For the energy efficiency, Serpens is 1.71x, 1.90x, and 42.7x better compared with GraphLily, Sextans, and K80, respectively. After scaling up to 24 HBM channels, Serpens achieves up to 30,204MTEPS and up to 3.79x over GraphLily.
翻译:SpMV在从图形分析到深层次学习的许多应用中发挥着关键的作用。 SpMV在许多应用中,从图表分析到深层次学习,随机的存储访问使稀少矩阵的随机记忆访问具有加速器设计的挑战性。然而,基于高频内存(HBM)的FPGA(FPGA)对于SpMV的加速器设计是一个很好的。在本文中,我们介绍Serpens,一个基于HB的 SpMV.Serpens 的加速器:(1) 一个通用设计、(2) 内存中心处理引擎和(3) 索引联结,以支持任意的SmVMV的高效处理。从对十二个大型矩阵的评估来看,Serpens在地理比例方面是1.91x和1.76x比最新的加速器“图形LiLy和Sextan”更适合设计加速器的加速器。我们还对2,519 SuiteSparse 矩阵和Serpensax达到2.10x比K80 GPUPU。对于能源效率而言, SerpenBS的升级后为1.90和SBIBIBS, 后分别达到1.70和SBS的SBRBS, 后达到1.70和Sl。