Large pretrained language models (PLMs) are often domain- or task-adapted via fine-tuning or prompting. Finetuning requires modifying all of the parameters and having enough data to avoid overfitting while prompting requires no training and few examples but limits performance. Instead, we prepare PLMs for data- and parameter-efficient adaptation by learning to learn the difference between general and adapted PLMs. This difference is expressed in terms of model weights and sublayer structure through our proposed dynamic low-rank reparameterization and learned architecture controller. Experiments on few-shot dialogue completion, low-resource abstractive summarization, and multi-domain language modeling show improvements in adaptation time and performance over direct finetuning or preparation via domain-adaptive pretraining. Ablations show our task-adaptive reparameterization (TARP) and model search (TAMS) components individually improve on other parameter-efficient transfer like adapters and structure-learning methods like learned sparsification.
翻译:微调要求修改所有参数,并有足够的数据来避免过度适应,同时在推动时不要求培训,只要求几个例子,但限制了性能。相反,我们通过学习一般和经调整的PLM(PLM)之间的差别,为数据和具有参数效率的适应而准备PLM(PLM),这种差别表现在模型重量和亚层结构上,具体表现在我们提议的动态低级别重新校准和学习的建筑控制器上。关于微小对话完成、低资源抽象合成和多功能语言模型的实验显示了适应时间和性能的改进,而适应时间和性能则超过了直接的微调或通过领域适应前培训的准备。减缩显示了我们的任务调整再校准(TARP)和模型搜索(TAMS)组成部分各自改进了其他参数效率转让,例如适应器和结构学习方法,如学习的填充。