The attention mechanism requires huge computational efforts to process unnecessary calculations, significantly limiting the system's performance. Researchers propose sparse attention to convert some DDMM operations to SDDMM and SpMM operations. However, current sparse attention solutions introduce massive off-chip random memory access. We propose CPSAA, a novel crossbar-based PIM-featured sparse attention accelerator. First, we present a novel attention calculation mode. Second, we design a novel PIM-based sparsity pruning architecture. Finally, we present novel crossbar-based methods. Experimental results show that CPSAA has an average of 89.6X, 32.2X, 17.8X, 3.39X, and 3.84X performance improvement and 755.6X, 55.3X, 21.3X, 5.7X, and 4.9X energy-saving when compare with GPU, FPGA, SANGER, ReBERT, and ReTransformer.
翻译:关注机制需要巨大的计算努力来处理不必要的计算,这大大限制了系统的性能。研究人员建议对将一些DDMMM操作转换为SDDMM和SpMM操作的关注很少。然而,目前关注不足的解决方案引入了大规模离芯随机内存访问。我们建议采用CPSAA,这是一个新的跨条基PIM-功能化的分散关注加速器。首先,我们提出了一种新的关注计算模式。第二,我们设计了一个新的基于PIM的孔径结构。最后,我们提出了新的跨条基方法。实验结果显示,CPSAA的性能改进平均为89.6X、32.2X、17.8X、3.39X和3.84X,与GPU、FGA、SANGER、REBERT和ReTrafored相比,755.6X、55.3X、21.3X、5.7X和4.9X节能。