Prefix-tuning, or more generally continuous prompt tuning, has become an essential paradigm of parameter-efficient transfer learning. Using a large pre-trained language model (PLM), prefix-tuning can obtain strong performance by training only a small portion of parameters. In this paper, we propose to understand and further develop prefix-tuning through the kernel lens. Specifically, we make an analogy between \textit{prefixes} and \textit{inducing variables} in kernel methods and hypothesize that \textit{prefixes} serving as \textit{inducing variables} would improve their overall mechanism. From the kernel estimator perspective, we suggest a new variant of prefix-tuning -- \textit{inducer-tuning}, which shares the exact mechanism as prefix-tuning while leveraging the residual form found in adapter-tuning. This mitigates the initialization issue in prefix-tuning. Through comprehensive empirical experiments on natural language understanding and generation tasks, we demonstrate that inducer-tuning can close the performance gap between prefix-tuning and fine-tuning.
翻译:前置调整或更一般的连续快速调整已成为参数效率转移学习的基本范例。 使用大型预先培训的语言模型( PLM), 前置调整只能通过培训一小部分参数而获得强效。 在本文中, 我们提议通过内核镜头理解和进一步发展前置调整。 具体地说, 我们比喻 \ textit{ prefixes} 和\ textit{ prefixes} 和\ textit{ inducting victs} 在内核方法中的比喻, 以及 用于 \ textit{ prefixes} 的变数, 用作 \ textit{ induction 变量, 将会改善它们的总体机制。 我们从内核显示, 我们建议了一个新的前置调整变式 -- \ textitle{ induction} 将精确的机制作为前置调整前置机制作为前置调整, 并同时利用在适应调整中发现的残余形式。 这减轻了前置调整中的初始化问题。 通过对自然语言理解和生成任务的全面实验实验实验, 我们证明导调能缩小前调整前的功能差距。