DyLORA:利用动态无搜索、低射线适应对预培训模型进行参数效率测试 (DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation)

With the ever-growing size of pre-trained models (PMs), fine-tuning them has become more expensive and resource-hungry. As a remedy, low-rank adapters (LoRA) keep the main pre-trained weights of the model frozen and just introduce some learnable truncated SVD modules (so-called LoRA blocks) to the model. While LoRA blocks are parameter efficient, they suffer from two major problems: first, the size of these blocks is fixed and cannot be modified after training (for example, if we need to change the rank of LoRA blocks, then we need to re-train them from scratch); second, optimizing their rank requires an exhaustive search and effort. In this work, we introduce a dynamic low-rank adaptation (DyLoRA) technique to address these two problems together. Our DyLoRA method trains LoRA blocks for a range of ranks instead of a single rank by sorting out the representation learned by the adapter module at different ranks during training. We evaluate our solution on different tasks of the GLUE benchmark using the RoBERTa model. Our results show that we can train dynamic search-free models with DyLoRA at least $7\times$ faster than LoRA without significantly compromising performance. Moreover, our models can perform consistently well on a much larger range of ranks compared to LoRA.

翻译：随着经过训练的模型(PM)规模不断扩大,微调这些模型变得越来越昂贵,资源也更加缺乏。作为一种补救措施,低级别适应者(LORA)保持模型主要经训练的重量,将模型的主要经训练的重量冻结起来,只是向模型引入一些可学习的短程SVD模块(所谓的LORA块),虽然LORA区块是高效的参数,但它们有两大问题:第一,这些区块的大小是固定的,在培训后无法修改(例如,如果我们需要改变LORA区块的等级,那么我们需要从零开始重新训练它们;第二,优化它们的级别需要彻底的搜索和努力。在这项工作中,我们引入了动态低级别适应(DyLORA)技术,以共同解决这两个问题。我们的DYLORA方法将LORA区块的等级排成一排,而不是单级,方法是将不同级别适应者模块所学到的代表进行分类。我们用ROERTA模型来评估我们关于GLUE基准不同任务的解决方案的解决方案的解决方案。我们的成果可以比最慢得多地进行搜索模型,比我们最慢得多的模型比较。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日