Large language models have demonstrated surprising ability to perform in-context learning, i.e., these models can be directly applied to solve numerous downstream tasks by conditioning on a prompt constructed by a few input-output examples. However, prior research has shown that in-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats. Therefore, the construction of an appropriate prompt is essential for improving the performance of in-context learning. In this paper, we revisit this problem from the view of predictive bias. Specifically, we introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes. Then we empirically show that prompts with higher bias always lead to unsatisfactory predictive quality. Based on this observation, we propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning. We perform comprehensive experiments with state-of-the-art mainstream models such as GPT-3 on various downstream tasks. Our results indicate that our method can enhance the model's in-context learning performance in an effective and interpretable manner.
翻译:大语言模型表现出惊人的能力,可以通过构建由少量输入-输出示例组成的提示来直接解决许多下游任务中的上下文学习问题。然而,先前的研究表明,由于训练示例、示例顺序和提示格式的变化导致的不稳定性问题,从而导致上下文学习表现不佳,因此,构建适当的提示对于提高上下文学习的性能至关重要。在本文中,我们从预测偏差的角度重新审视了此问题。具体地,我们引入了一个指标来评估固定提示相对于标签或特定属性的预测偏差。然后,我们经验性地展示,具有较高偏差的提示总是导致令人不满意的预测质量。基于这一观察结果,我们提出了一种基于贪婪搜索的新的搜索策略,以识别接近最优的提示,以改善上下文学习的性能。我们使用最新的主流模型,如GPT-3,在各种下游任务上进行了全面的实验。我们的结果表明,我们的方法可以以有效且可解释的方式增强模型的上下文学习性能。