Top-K SpMV is a key component of similarity-search on sparse embeddings. This sparse workload does not perform well on general-purpose NUMA systems that employ traditional caching strategies. Instead, modern FPGA accelerator cards have a few tricks up their sleeve. We introduce a Top-K SpMV FPGA design that leverages reduced precision and a novel packet-wise CSR matrix compression, enabling custom data layouts and delivering bandwidth efficiency often unreachable even in architectures with higher peak bandwidth. With HBM-based boards, we are 100x faster than a multi-threaded CPU implementation and 2x faster than a GPU with 20% higher bandwidth, with 14.2x higher power-efficiency.
翻译:顶KSpMV是稀薄嵌入层的类似搜索的关键组成部分。 这种稀薄的工作量在采用传统缓冲策略的通用NUMA系统上表现不佳。 相反,现代的FPGA加速器卡的袖子里有一些技巧。 我们引入了顶K SpMV FPGA设计,利用了降低精确度和新颖的包式CSR矩阵压缩,使定制数据布局和带宽效率即使在高峰宽带宽的建筑中也常常无法达到。 在基于 HBM 的板块上,我们比多轨CPU执行速度快100x,比高20%带宽的GPU速度2x快,高14.2x功率。