State-of-the-art parameter-efficient fine-tuning methods rely on introducing adapter modules between the layers of a pretrained language model. However, such modules are trained separately for each task and thus do not enable sharing information across tasks. In this paper, we show that we can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks, which condition on task, adapter position, and layer id in a transformer model. This parameter-efficient multi-task learning framework allows us to achieve the best of both worlds by sharing knowledge across tasks via hypernetworks while enabling the model to adapt to each individual task through task-specific adapters. Experiments on the well-known GLUE benchmark show improved performance in multi-task learning while adding only 0.29% parameters per task. We additionally demonstrate substantial performance improvements in few-shot domain generalization across a variety of tasks. Our code is publicly available in https://github.com/rabeehk/hyperformer.
翻译:最先进的参数效率微调方法依赖于在经过训练的语言模型的层次之间引入适应器模块。 但是, 这些模块是针对每项任务分别培训的, 因而无法共享跨任务的信息 。 在本文中, 我们显示, 我们可以通过使用共享的超网络来生成所有层次和任务的适应器参数, 这些超网络取决于任务、 适配器位置和变压器模型中的分层。 这个具有参数效率的多任务学习框架让我们通过超网络共享任务间知识,同时通过特定任务适应每个任务, 使模型能够适应每个任务。 众所周知的 GLUE 基准实验显示, 多任务学习的性能有所改善, 而每任务只增加0. 29% 参数。 我们进一步展示了在几发域对不同任务的一般化上的巨大性改进。 我们的代码可在 https://github.com/rabeehk/hyperform上公开查阅 。