建立效率管道:转换器效率操作员的通信性和累积性 (Building an Efficiency Pipeline: Commutativity and Cumulativeness of Efficiency Operators for Transformers)

There exists a wide variety of efficiency methods for natural language processing (NLP) tasks, such as pruning, distillation, dynamic inference, quantization, etc. We can consider an efficiency method as an operator applied on a model. Naturally, we may construct a pipeline of multiple efficiency methods, i.e., to apply multiple operators on the model sequentially. In this paper, we study the plausibility of this idea, and more importantly, the commutativity and cumulativeness of efficiency operators. We make two interesting observations: (1) Efficiency operators are commutative -- the order of efficiency methods within the pipeline has little impact on the final results; (2) Efficiency operators are also cumulative -- the final results of combining several efficiency methods can be estimated by combining the results of individual methods. These observations deepen our understanding of efficiency operators and provide useful guidelines for their real-world applications.

翻译：自然语言处理(NLP)任务的效率方法多种多样,例如修剪、蒸馏、动态推论、量化等。我们可以将效率方法视为一个适用于模型的操作者。自然,我们可以建造一个由多种效率方法组成的管道,即按顺序在模型上应用多个操作者。在本文件中,我们研究了这一想法的可信赖性,更重要的是,效率操作者的通性和累积性。我们提出两点有趣的意见:(1)效率操作者是通畅的 -- -- 管道内效率方法的顺序对最终结果影响不大;(2)效率操作者也是累积的 -- -- 将若干效率方法结合起来的最后结果可以通过综合个别方法的结果来估计。这些观察加深了我们对效率操作者的理解,并为他们的实际应用提供了有用的指导。