The massive amount of trainable parameters in the pre-trained language models (PLMs) makes them hard to be deployed to multiple downstream tasks. To address this issue, parameter-efficient transfer learning methods have been proposed to tune only a few parameters during fine-tuning while freezing the rest. This paper looks at existing methods along this line through the \textit{kernel lens}. Motivated by the connection between self-attention in transformer-based PLMs and kernel learning, we propose \textit{kernel-wise adapters}, namely \textit{Kernel-mix}, that utilize the kernel structure in self-attention to guide the assignment of the tunable parameters. These adapters use guidelines found in classical kernel learning and enable separate parameter tuning for each attention head. Our empirical results, over a diverse set of natural language generation and understanding tasks, show that our proposed adapters can attain or improve the strong performance of existing baselines.
翻译:培训前语言模型(PLM)中大量可培训的参数使其难以被部署到多个下游任务中。 为了解决这个问题, 提议了参数效率转移学习方法, 在微调时只调整几个参数, 同时冻结其余参数。 本文通过 \ textit{ 内核镜头查看此线上的现有方法 。 受基于变压器的PLMs 和内核学习中自知与内核学习之间的联系的驱动, 我们提议了\ textit{ 内核- 调适器}, 即\ textit{ Kernel- mix}, 利用自省内核结构来指导金枪鱼参数的指定。 这些调适器使用古典内核学习中发现的指导方针, 并允许对每个注意力头进行单独的参数调适。 我们的经验结果, 涉及一系列不同的自然语言生成和理解任务, 显示我们提议的调适配者能够实现或改进现有基线的强性。