Self-attention is a key enabler of state-of-art accuracy for various transformer-based Natural Language Processing models. This attention mechanism calculates a correlation score for each word with respect to the other words in a sentence. Commonly, only a small subset of words highly correlates with the word under attention, which is only determined at runtime. As such, a significant amount of computation is inconsequential due to low attention scores and can potentially be pruned. The main challenge is finding the threshold for the scores below which subsequent computation will be inconsequential. Although such a threshold is discrete, this paper formulates its search through a soft differentiable regularizer integrated into the loss function of the training. This formulation piggy backs on the back-propagation training to analytically co-optimize the threshold and the weights simultaneously, striking a formally optimal balance between accuracy and computation pruning. To best utilize this mathematical innovation, we devise a bit-serial architecture, dubbed LeOPArd, for transformer language models with bit-level early termination microarchitectural mechanism. We evaluate our design across 43 back-end tasks for MemN2N, BERT, ALBERT, GPT-2, and Vision transformer models. Post-layout results show that, on average, LeOPArd yields 1.9x and 3.9x speedup and energy reduction, respectively, while keeping the average accuracy virtually intact (<0.2% degradation)
翻译:自留是各种基于变压器的自然语言处理模型中最先进的精确度的关键推进器。 这个引力机制计算了每个单词与句子中其他词的相对比值。 通常, 只有一小部分单词与注意中的单词高度相关, 仅在运行时确定。 因此, 大量计算由于关注分数低而无关紧要, 并有可能进行调整。 主要的挑战是如何找到分数的临界值, 低于此分的分数随后的计算将是不相容的。 虽然此阈值是离散的, 本文通过软化的可变后端转换器来进行搜索, 整合到培训的损失函数中。 通常, 这种配对回的单词只与注意中的单词高度相关, 仅在运行时确定。 因此, 大量的计算由于关注分数和计算分数的偏差不相适应, 并且有可能进行调整。 为了最佳地利用这种数学创新, 我们设计了一个微级结构结构, 调的LeOPAr, 和NPTFS- sliveral resulational resulational resulational resulational resulational Resulational Resulational Resulding, 我们在43 AL- AL- AL- AL- AL- ALVium AL- AL- AL- AL- AL- AL- delviewx AL- AL- AL- AL- AL- AL- AL- AL- AL- AL- AL- delviewolview