Large language models have demonstrated surprising ability to perform in-context learning, i.e., these models can be directly applied to solve numerous downstream tasks by conditioning on a prompt constructed by a few input-output examples. However, prior research has shown that in-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats. Therefore, the construction of an appropriate prompt is essential for improving the performance of in-context learning. In this paper, we revisit this problem from the view of predictive bias. Specifically, we introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes. Then we empirically show that prompts with higher bias always lead to unsatisfactory predictive quality. Based on this observation, we propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning. We perform comprehensive experiments with state-of-the-art mainstream models such as GPT-3 on various downstream tasks. Our results indicate that our method can enhance the model's in-context learning performance in an effective and interpretable manner.
翻译:大型语言模型已经展示了惊人的现场学习能力, 也就是这些模型可以通过少量的输入输出示例构造出prompt,用于解决众多下游任务。然而,先前的研究表明,现场学习可能由于训练示例的变化、示例顺序和提示格式而产生高度的不稳定性。因此,构造适当的提示对于提高现场学习的性能至关重要。本文从预测偏差的角度重新考虑这个问题。具体而言,我们引入了一个度量来评估固定提示相对于标签或特定属性的预测偏差。然后我们经验证明,具有更高偏差的提示总是导致不令人满意的预测质量。基于这一观察结果,我们提出了一种新的搜索策略,使用贪心搜索来识别接近最优的提示,从而提高现场学习的性能。我们使用GPT-3这样的主流模型在各种下游任务上进行了全面的实验。结果表明,我们的方法可以以有效且可解释的方式增强模型的现场学习性能。