Sparse tensor computing is a core computational part of numerous applications in areas such as data science, graph processing, and scientific computing. Sparse tensors offer the potential of skipping unnecessary computations caused by zero values. In this paper, we propose a new strategy for extending row-wise product sparse tensor accelerators. We propose a new processing element called Maple that uses multiple multiply-accumulate (MAC) units to exploit local clusters of non-zero values to increase parallelism and reduce data movement. Maple works on the compressed sparse row (CSR) format and calculates only non-zero elements of the input matrices based on the sparsity pattern. Furthermore, we may employ Maple as a basic building block in a variety of spatial tensor accelerators that operate based on a row-wise product approach. As a proof of concept, we utilize Maple in two reference accelerators: Extensor and Matraptor. Our experiments show that using Maple in Matraptor and Extensor achieves 50% and 60% energy benefit and 15% and 22% speedup over the baseline designs, respectively. Employing Maple also results in 5.9x and 15.5x smaller area consumption in Matraptor and Extensor compared with the baseline structures, respectively.
翻译:稀疏张量计算是众多领域应用(如数据科学、图形处理和科学计算)中的核心计算部分。稀疏张量具有跳过由零值引起的不必要计算的潜力。在本文中,我们提出了一种扩展行乘积稀疏张量加速器的新策略。我们提出了一种称为Maple的新处理单元,使用多个乘加(MAC)单元利用局部非零值集群,以增加并行性并减少数据移动。Maple基于压缩稀疏行(CSR)格式工作,并根据稀疏模式仅计算输入矩阵的非零元素。此外,我们可以将Maple作为一种基本构建块,用于各种基于行乘积方法操作的空间张量加速器中。作为概念证明,我们在两个参考加速器Extensor和Matraptor中使用Maple。我们的实验结果表明,在Matraptor和Extensor中使用Maple相较于基线设计分别实现了50%和60%的能量收益和15%和22%的加速。使用Maple还使Matraptor和Extensor的面积消耗分别减小了5.9倍和15.5倍。