With a handful of demonstration examples, large-scale language models show strong capability to perform various tasks by in-context learning from these examples, without any fine-tuning. We demonstrate that in-context learning performance can be highly unstable across samples of examples, indicating the idiosyncrasies of how language models acquire information. We formulate example selection for in-context learning as a sequential decision problem, and propose a reinforcement learning algorithm for identifying generalizable policies to select demonstration examples. For GPT-2, our learned policies demonstrate strong abilities of generalizing to unseen tasks in training, with a $5.8\%$ improvement on average. Examples selected from our learned policies can even achieve a small improvement on GPT-3 Ada. However, the improvement diminishes on larger GPT-3 models, suggesting emerging capabilities of large language models.
翻译:有大量示范实例,大型语言模式通过不作任何微调而从这些实例中从文字中学习,显示出执行各种任务的强大能力。我们证明,从实例样本中看,通文学习的绩效可能极不稳定,表明语言模式如何获取信息的特异性。我们为内文学习制定实例选择,将其作为一个相继决定问题,并提出一种强化学习算法,用以确定可选择示范实例的通用政策。对于GPT-2,我们所学习的政策表明,在培训中,有很强的能力将普通化为不可见的任务,平均提高5.8美元。从我们所学的政策中挑选的例子甚至可以对GPT-3 Ada稍作改进。然而,这种改进削弱了更大的GPT-3模式,表明大型语言模式的新兴能力。