Multi-task language models show outstanding performance for various natural language understanding tasks with only a single model. However, these language models utilize an unnecessarily large number of model parameters, even when used only for a specific task. This paper proposes a novel training-free compression method for multi-task language models using a pruning method. Specifically, we use an attribution method to determine which neurons are essential for performing a specific task. We task-specifically prune unimportant neurons and leave only task-specific parameters. Furthermore, we extend our method to be applicable in low-resource and unsupervised settings. Since our compression method is training-free, it uses few computing resources and does not destroy the pre-trained knowledge of language models. Experimental results on the six widely-used datasets show that our proposed pruning method significantly outperforms baseline pruning methods. In addition, we demonstrate that our method preserves performance even in an unseen domain setting.
翻译:多任务语言模型显示,各种自然语言理解任务只使用一个单一模型的杰出表现。然而,这些语言模型使用不必要的大量模型参数,即使仅仅用于特定任务。本文件提议对多种任务语言模型使用一种新的不训练压缩方法,使用一种编审方法。具体地说,我们使用一种归属方法来确定哪些神经元对执行具体任务至关重要。我们的任务特定型不重要的神经元,只留下特定任务参数。此外,我们扩展了我们的方法,使之适用于低资源和不受监督的环境。由于我们的压缩方法没有培训,它很少使用计算资源,也没有摧毁语言模型的预培训知识。六种广泛使用的数据集的实验结果显示,我们提议的裁员方法大大优于基线裁剪方法。此外,我们证明,我们的方法即使在一个看不见的域设置中也保留了性能。