Prompt-Tuning is a new paradigm for finetuning pre-trained language models in a parameter-efficient way. Here, we explore the use of HyperNetworks to generate hyper-prompts: we propose HyperPrompt, a novel architecture for prompt-based task-conditioning of self-attention in Transformers. The hyper-prompts are end-to-end learnable via generation by a HyperNetwork. HyperPrompt allows the network to learn task-specific feature maps where the hyper-prompts serve as task global memories for the queries to attend to, at the same time enabling flexible information sharing among tasks. We show that HyperPrompt is competitive against strong multi-task learning baselines with as few as $0.14\%$ of additional task-conditioning parameters, achieving great parameter and computational efficiency. Through extensive empirical experiments, we demonstrate that HyperPrompt can achieve superior performances over strong T5 multi-task learning baselines and parameter-efficient adapter variants including Prompt-Tuning and HyperFormer++ on Natural Language Understanding benchmarks of GLUE and SuperGLUE across many model sizes.
翻译:快速调试是用参数效率方式微调预先培训语言模型的新范例。 在这里, 我们探索使用超网络来生成超速提示: 我们提出超速Prompt, 这是一个用于在变换器中迅速调整自我关注任务的新结构。 超速提示是可以通过超网络生成的端到端学习。 超速提示让网络学习任务特有地图, 超速提示作为要处理询问的任务的全球记忆, 同时使任务之间能够灵活共享信息。 我们显示超速提示与强大的多任务学习基线相比具有竞争力, 额外任务调试参数只有0. 14 美元, 达到很高的参数和计算效率。 我们通过广泛的实验, 超速提示能够在许多规模的模型中取得强的T5多任务学习基线和高参数效率调试样的优异性性表现, 包括GLUE 和 SuperGLUE的自然语言理解基准的即时和超福默++++。