AccelTran: 使用变换器进行动态推断的分级- 软件加速器</s> (AccelTran: A Sparsity-Aware Accelerator for Dynamic Inference with Transformers)

Self-attention-based transformer models have achieved tremendous success in the domain of natural language processing. Despite their efficacy, accelerating the transformer is challenging due to its quadratic computational complexity and large activation sizes. Existing transformer accelerators attempt to prune its tokens to reduce memory access, albeit with high compute overheads. Moreover, previous works directly operate on large matrices involved in the attention operation, which limits hardware utilization. In order to address these challenges, this work proposes a novel dynamic inference scheme, DynaTran, which prunes activations at runtime with low overhead, substantially reducing the number of ineffectual operations. This improves the throughput of transformer inference. We further propose tiling the matrices in transformer operations along with diverse dataflows to improve data reuse, thus enabling higher energy efficiency. To effectively implement these methods, we propose AccelTran, a novel accelerator architecture for transformers. Extensive experiments with different models and benchmarks demonstrate that DynaTran achieves higher accuracy than the state-of-the-art top-k hardware-aware pruning strategy while attaining up to 1.2$\times$ higher sparsity. One of our proposed accelerators, AccelTran-Edge, achieves 330K$\times$ higher throughput with 93K$\times$ lower energy requirement when compared to a Raspberry Pi device. On the other hand, AccelTran-Server achieves 5.73$\times$ higher throughput and 3.69$\times$ lower energy consumption compared to the state-of-the-art transformer co-processor, Energon.

翻译：以自我关注为基础的变压器模型在自然语言处理领域取得了巨大的成功。尽管其效率很高, 加速变压器因其四进制计算复杂度和启动规模大而具有挑战性。现有的变压器加速器试图将质谱压缩, 以减少内存访问, 尽管计算成本高。此外, 以前的工程直接运行在关注操作中所涉及的大型矩阵上, 限制了硬件的使用。为了应对这些挑战, 这项工作提出了一个新的动态推论方案, 即 Dynatran, 它在运行时以低间接费用启动变压器, 大大降低了无效操作的数量。这改善了变压器的吞吐量。我们进一步提议在变压器操作中加压矩阵, 以及各种数据流来改善数据的再利用。为了有效实施这些方法, 我们提议了AccelTran, 一个全新的变压器加速器结构。与不同的模型和基准进行广泛的实验表明, Dynatran 相对较高的美元顶价, 大大降低了效率操作数量。这改善了变压器的耗量量量量值。与Arental- trainal- cal- caltime- cental- cental- cental- crial- sal- deal- deal- sal- sal- sal- yal- yal- yal- yal- yal- sal- yal- y- yal- yal- sal- y- y- y- sal- y- y- y- y- yal- putal- sal- lemental- ptramental- sal- y- y- sal- sal- sal- sal- sal- sal- sal- sal- a- sal- y- y- y- y- y- y- y- a- a- a- a- a-tial- lemental- lamental- lemental-tial- lemental- y- a- a- y- y- a- a- a- a- lemental- lemental- a- a- lemental- a- a- lemental-</s>