Low-rank adaptation (LoRA) is one of the most popular methods among parameter-efficient fine-tuning (PEFT) methods to adapt pre-trained large language models (LLMs) to specific downstream tasks. However, the model trained based on LoRA often has an unsatisfactory performance due to its low-rank assumption. In this paper, we propose a novel method called Dual LoRA to improve the performance by incorporating an inductive bias into the original LoRA. Specifically, we separate low-rank matrices into two groups: the magnitude group to control whether or not and how far we should update a parameter and the direction group to decide whether this parameter should move forward or backward, to better simulate the parameter updating process of the full fine-tuning based on gradient-based optimization algorithms. We show that this can be simply achieved by adding a ReLU function to the magnitude group and a sign function to the direction group. We conduct several experiments over a wide range of NLP tasks, including natural language understanding (NLU) and commonsense reasoning datasets on RoBERTa, DeBERTa, and LLaMA-1/2/3 as baseline models. The results show that we consistently outperform LoRA and its state-of-the-art variants with the same number of trainable parameters.
翻译:低秩自适应(LoRA)是参数高效微调(PEFT)方法中最流行的技术之一,用于使预训练大语言模型(LLMs)适应特定下游任务。然而,基于LoRA训练的模型常因其低秩假设而表现欠佳。本文提出一种称为双LoRA的新方法,通过向原始LoRA引入归纳偏置来提升性能。具体而言,我们将低秩矩阵分为两组:幅度组用于控制是否更新参数以及更新距离,方向组用于决定参数应向前或向后移动,从而更好地模拟基于梯度优化算法的全参数微调过程。我们证明,这可通过在幅度组添加ReLU函数、在方向组添加符号函数简单实现。我们在包括自然语言理解(NLU)和常识推理数据集在内的多种NLP任务上进行了实验,并以RoBERTa、DeBERTa和LLaMA-1/2/3作为基线模型。结果表明,在可训练参数数量相同的情况下,我们的方法始终优于LoRA及其最先进的变体。