Large language models have demonstrated surprising ability to perform in-context learning, i.e., these models can be directly applied to solve numerous downstream tasks by conditioning on a prompt constructed by a few input-output examples. However, prior research has shown that in-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats. Therefore, the construction of an appropriate prompt is essential for improving the performance of in-context learning. In this paper, we revisit this problem from the view of predictive bias. Specifically, we introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes. Then we empirically show that prompts with higher bias always lead to unsatisfactory predictive quality. Based on this observation, we propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning. We perform comprehensive experiments with state-of-the-art mainstream models such as GPT-3 on various downstream tasks. Our results indicate that our method can enhance the model's in-context learning performance in an effective and interpretable manner.
翻译:摘要:大型语言模型展示了在上下文学习上惊人的能力,即这些模型可以直接通过构造少量的输入输出样本来解决许多下游任务。然而,以前的研究表明,上下文学习可能由于训练样本的变化,示例顺序和提示格式的变化而产生高度的不稳定性。因此,适当构造提示对于提高上下文学习的性能至关重要。在这篇论文中,我们从预测偏差的角度重新讨论了这个问题。具体而言,我们引入了一个衡量固定提示相对于标签或给定属性的预测偏差的指标。然后我们实证表明,较高偏差的提示总是导致不满意的预测质量。基于这个观察,我们提出了一种基于贪心搜索的新型搜索策略,以确定接近最优的提示,从而提高上下文学习的性能。我们使用现有的主流模型,如 GPT-3,在各种下游任务上进行了全面的实验。我们的结果表明,我们的方法可以以有效且可解释的方式增强模型的上下文学习性能。