Large language models (LMs) are able to in-context learn -- perform a new task via inference alone by conditioning on a few input-label pairs (demonstrations) and making predictions for new inputs. However, there has been little understanding of how the model learns and which aspects of the demonstrations contribute to end task performance. In this paper, we show that ground truth demonstrations are in fact not required -- randomly replacing labels in the demonstrations barely hurts performance on a range of classification and multi-choce tasks, consistently over 12 different models including GPT-3. Instead, we find that other aspects of the demonstrations are the key drivers of end task performance, including the fact that they provide a few examples of (1) the label space, (2) the distribution of the input text, and (3) the overall format of the sequence. Together, our analysis provides a new way of understanding how and why in-context learning works, while opening up new questions about how much can be learned from large language models through inference alone.
翻译:大型语言模型(LMS)能够在文字中学习 -- -- 单靠一些输入标签配对(演示)和预测新投入来进行推论来完成一项新的任务。然而,对于模型如何学习以及演示的哪些方面有助于结束任务性能,我们几乎没有什么了解。在本文中,我们表明事实上不需要地面真相演示 -- -- 随机地取代演示中的标签不会损害一系列分类和多重任务的业绩,持续超过12个不同的模型,包括GPT-3。相反,我们发现演示的其他方面是最终任务性能的关键驱动因素,包括它们提供了以下几个例子:(1) 标签空间,(2) 输入文本的分布,(3) 序列的总体格式。我们的分析共同提供了一种新的理解方式,即如何和为什么在文字中学习工作,同时提出新的问题,即仅通过推理就能从大语言模型学到多少东西。