The Outstanding performance and growing size of Large Language Models has led to increased attention in parameter efficient learning. The two predominant approaches are Adapters and Pruning. Adapters are to freeze the model and give it a new weight matrix on the side, which can significantly reduce the time and memory of training, but the cost is that the evaluation and testing will increase the time and memory consumption. Pruning is to cut off some weight and re-distribute the remaining weight, which sacrifices the complexity of training at the cost of extremely high memory and training time, making the cost of evaluation and testing relatively low. So efficiency of training and inference can't be obtained in the same time. In this work, we propose a task-oriented Pruning-Adapter method that achieve a high memory efficiency of training and memory, and speeds up training time and ensures no significant decrease in accuracy in GLUE tasks, achieving training and inference efficiency at the same time.
翻译:优秀的性能和日益增长的大型语言模型的大小,导致人们越来越关注参数高效学习。两种主要的方法是适配器和剪枝。适配器冻结模型,并在侧面给它一个新的权重矩阵,可以显著减少训练的时间和内存,但代价是评估和测试将增加时间和内存消耗。剪枝是切断一些权重并重新分配剩余权重,这牺牲了训练的复杂性,代价是内存和训练时间极高,降低了评估和测试的成本。因此,训练和推理效率不能同时获得。在这项工作中,我们提出了一种任务导向的剪枝适配器方法,在GLUE任务中实现了高效的训练内存和内存使用效率,并加快了训练时间,确保准确性没有显著降低,同时实现了训练和推理效率。