Pre-trained text-to-text transformers such as BART have achieved impressive performance across a range of NLP tasks. Recent study further shows that they can learn to generalize to novel tasks, by including task descriptions as part of the source sequence and training the model with (source, target) examples. At test time, these fine-tuned models can make inferences on new tasks using the new task descriptions as part of the input. However, this approach has potential limitations, as the model learns to solve individual (source, target) examples (i.e., at the instance level), instead of learning to solve tasks by taking all examples within a task as a whole (i.e., at the task level). To this end, we introduce Hypter, a framework that improves text-to-text transformer's generalization ability to unseen tasks by training a hypernetwork to generate task-specific, light-weight adapters from task descriptions. Experiments on ZEST dataset and a synthetic SQuAD dataset demonstrate that Hypter improves upon fine-tuning baselines. Notably, when using BART-Large as the main network, Hypter brings 11.3% comparative improvement on ZEST dataset.
翻译:BART等经过预先培训的文本变换器在一系列NLP任务中取得了令人印象深刻的成绩。最近的研究进一步显示,它们可以通过将任务说明作为源序列的一部分,并用(源、目标)示例培训模型,学习将新任务概括为新任务。在测试时,这些经过微调的模型可以用新的任务说明作为投入的一部分,对新任务作出推断。然而,这一方法有潜在的局限性,因为模型学会解决单个(源、目标)例子(例如,实例一级)的例子,而不是学习通过在整个任务(例如,任务一级)中采用所有实例来解决任务,而不是学习通过在任务中采用所有实例(例如,任务一级)来解决新任务。为此,我们引入了Hypter这一框架,通过培训超网络来产生任务特定任务、轻量的适应器来提高文本变换器对不可见任务的一般化能力,从任务说明中生成超网络生成任务特定任务的、轻量的适应器。关于ZEST数据集的实验和合成SQAD数据集表明,Hepter改进了精确的基线。 注意,在使用BART-LARC作为主网络上的比较数据时,11.3。