With the ever-growing size of pretrained models (PMs), fine-tuning them has become more expensive and resource-hungry. As a remedy, low-rank adapters (LoRA) keep the main pretrained weights of the model frozen and just introduce some learnable truncated SVD modules (so-called LoRA blocks) to the model. While LoRA blocks are parameter-efficient, they suffer from two major problems: first, the size of these blocks is fixed and cannot be modified after training (for example, if we need to change the rank of LoRA blocks, then we need to re-train them from scratch); second, optimizing their rank requires an exhaustive search and effort. In this work, we introduce a dynamic low-rank adaptation (DyLoRA) technique to address these two problems together. Our DyLoRA method trains LoRA blocks for a range of ranks instead of a single rank by sorting the representation learned by the adapter module at different ranks during training. We evaluate our solution on different natural language understanding (GLUE benchmark) and language generation tasks (E2E, DART and WebNLG) using different pretrained models such as RoBERTa and GPT with different sizes. Our results show that we can train dynamic search-free models with DyLoRA at least 4 to 7 times (depending to the task) faster than LoRA without significantly compromising performance. Moreover, our models can perform consistently well on a much larger range of ranks compared to LoRA.
翻译:随着预训练模型(PM)的不断增大,对其进行微调变得更加昂贵和资源密集。为了解决这个问题,低秩适配器(LoRA)将模型的主要预训练权重保持冻结,并引入一些可学习的截断SVD模块(称为LoRA块)到模型中。虽然LoRA块是参数高效的,但它们存在两个主要问题:首先,这些块的大小是固定的,不能在训练后进行修改(例如,如果我们需要更改LoRA块的秩,则需要从头开始重新训练它们);其次,优化它们的秩需要耗费大量的搜索和努力。在这项工作中,我们引入了一种动态低秩适应(DyLoRA)技术,以解决这两个问题。我们的DyLoRA方法通过在训练期间排序不同秩的适配器模块学习的表示而不是单个秩来训练LoRA块。我们使用不同大小的预训练模型(如RoBERTa和GPT)在不同的自然语言理解(GLUE基准)和语言生成任务(E2E,DART和WebNLG)上评估我们的解决方案。我们的结果表明,我们可以使用DyLoRA训练动态无搜索模型,比LoRA快4到7倍(取决于任务),而不会显着牺牲性能。此外,与LoRA相比,我们的模型可以在更大的秩范围内表现出稳定的性能。