Parameter-efficient fine-tuning (PEFT) methods can adapt large language models to downstream tasks by training a small amount of newly added parameters. In multi-task settings, PEFT adapters typically train on each task independently, inhibiting transfer across tasks, or on the concatenation of all tasks, which can lead to negative interference. To address this, Polytropon (Ponti et al.) jointly learns an inventory of PEFT adapters and a routing function to share variable-size sets of adapters across tasks. Subsequently, adapters can be re-combined and fine-tuned on novel tasks even with limited data. In this paper, we investigate to what extent the ability to control which adapters are active for each task leads to sample-efficient generalization. Thus, we propose less expressive variants where we perform weighted averaging of the adapters before few-shot adaptation (Poly-mu) instead of learning a routing function. Moreover, we introduce more expressive variants where finer-grained task-adapter allocation is learned through a multi-head routing function (Poly-S). We test these variants on three separate benchmarks for multi-task learning. We find that Poly-S achieves gains on all three (up to 5.3 points on average) over strong baselines, while incurring a negligible additional cost in parameter count. In particular, we find that instruction tuning, where models are fully fine-tuned on natural language instructions for each task, is inferior to modular methods such as Polytropon and our proposed variants.
翻译:参数效率微调( PEFT) 方法可以通过培训少量新增加的参数,使大型语言模型适应下游任务。 在多任务环境中, PEFT 适应器通常对每项任务进行独立培训,禁止跨任务转移,或对所有任务进行交配,这可能导致负面干扰。要解决这个问题,Politropon(Ponti等人) 联合学习PEFT 适配器清单和路由功能,以分享不同任务之间可变的适应器组合。随后,即使数据有限,适应器也可以对新任务进行重新组合和微调。在本文件中,我们调查每个任务适应器所活跃的控制能力在多大程度上能导致抽样高效的概括化。因此,我们提出较不那么明显的变式,我们在几发式调整(Polly-mu) 之前对适应器进行加权平均平均的调整,而不是学习路由功能。 此外,我们引入了更清晰的变式,通过多头路流功能(Poly-S) 调整任务适应器配置新任务适应器配置。 我们测试这些变式在三等标准中, 我们测试这些变式的常数的常变式, 都在数基准中, 都在测试这些变式在数级标准中, 。