The Outstanding performance and growing size of Large Language Models has led to increased attention in parameter efficient learning. The two predominant approaches are Adapters and Pruning. Adapters are to freeze the model and give it a new weight matrix on the side, which can significantly reduce the time and memory of training, but the cost is that the evaluation and testing will increase the time and memory consumption. Pruning is to cut off some weight and re-distribute the remaining weight, which sacrifices the complexity of training at the cost of extremely high memory and training time, making the cost of evaluation and testing relatively low. So efficiency of training and inference can't be obtained in the same time. In this work, we propose a task-oriented Pruning-Adapter method that achieve a high memory efficiency of training and memory, and speeds up training time and ensures no significant decrease in accuracy in GLUE tasks, achieving training and inference efficiency at the same time.
翻译:大型语言模型的出色性能和不断增长的尺寸导致参数高效学习越来越受到关注。两种主要的方法是适配器和剪枝。适配器是冻结模型并在侧面给它一个新的权重矩阵,这可以显着减少训练的时间和内存,但代价是评估和测试将增加时间和内存消耗。剪枝是切断一些权重并重新分配剩余的权重,在牺牲训练的复杂度的前提下以极高的内存和训练时间为代价,使得评估和测试的成本相对较低。因此,无法同时获得训练和推理的效率。在这项工作中,我们提出了一种任务导向的剪枝-适配器方法,实现了高内存效率的训练和内存,并加速了训练时间,并在GLUE任务中确保了准确性没有显著下降,从而实现了训练和推理效率。