Meta-training, which fine-tunes the language model (LM) on various downstream tasks by maximizing the likelihood of the target label given the task instruction and input instance, has improved the zero-shot task generalization performance. However, meta-trained LMs still struggle to generalize to challenging tasks containing novel labels unseen during meta-training. In this paper, we propose Flipped Learning, an alternative method of meta-training which trains the LM to generate the task instruction given the input instance and label. During inference, the LM trained with Flipped Learning, referred to as Flipped, selects the label option that is most likely to generate the task instruction. On 14 tasks of the BIG-bench benchmark, the 3B-sized Flipped outperforms 4 times larger zero-shot T0-11B and even a 60 times larger 3-shot GPT-3 (175B) on average by 1.8% and 3.1%, respectively. Flipped gives particularly large improvements on unseen labels, outperforming T0-11B by up to +20% average F1 score. This indicates that the strong task generalization of Flipped comes from improved generalization to novel labels. We release our code at https://github.com/seonghyeonye/Flipped-Learning.
翻译:在下游任务语言模式(LM)上微调语言模型(LM)的元培训,通过最大限度地提高任务指令和输入实例中目标标签的可能性,微调语言模型(LM)在各种下游任务上的位置,改进了零点任务一般化的性能。然而,经过元培训的LM仍然在努力向包含在元培训期间看不见的新标签的任务的挑战性任务推广。在本文中,我们提议了一种非标准培训方法,即Flippled Learning(LM),根据输入实例和标签来培训LM(LM)来生成任务指示。在推断期间,受过Flippled Learning培训的LM(称为Flipped)选择了最有可能产生任务指示的标签选项。在BIG-bench基准的14项任务中,3B级的Fliflift(3B)比零点T0-11B高出4倍,甚至比3倍的GPT-3(175B)平均为1.8 %和3.1%。Flish Leximstaldalationalations http/Welishalshistitalshistitaldaldalation)的强任务一般化。