Attention mechanism has become the dominant module in natural language processing models. It is computationally intensive and depends on massive power-hungry multiplications. In this paper, we rethink variants of attention mechanism from the energy consumption aspects. After reaching the conclusion that the energy costs of several energy-friendly operations are far less than their multiplication counterparts, we build a novel attention model by replacing multiplications with either selective operations or additions. Empirical results on three machine translation tasks demonstrate that the proposed model, against the vanilla one, achieves competitable accuracy while saving 99\% and 66\% energy during alignment calculation and the whole attention procedure. Code is available at: https://github.com/NLP2CT/E-Att.
翻译:注意机制已成为自然语言处理模式的主要模块,它具有计算密集性,并取决于大规模电力饥饿乘数。在本文中,我们重新思考能源消耗方面关注机制的变式。我们得出的结论是,一些能源友好型作业的能源成本远远低于其乘数对应作业的能源成本,我们通过以选择性操作或附加取代乘数来构建一种新的关注模式。三种机器翻译任务的经验结果表明,与香草一相比,拟议的模式实现了可比较的准确性,同时在调整计算和整个关注程序期间节省了99 ⁇ 和66 ⁇ 能源。代码可在以下网址查阅:https://github.com/NLP2CT/E-Att。