Large language models have achieved impressive performance on various natural language processing tasks. However, so far they have been evaluated primarily on benchmarks where all information in the input context is relevant for solving the task. In this work, we investigate the distractibility of large language models, i.e., how the model problem-solving accuracy can be influenced by irrelevant context. In particular, we introduce Grade-School Math with Irrelevant Context (GSM-IC), an arithmetic reasoning dataset with irrelevant information in the problem description. We use this benchmark to measure the distractibility of cutting-edge prompting techniques for large language models, and find that the model performance is dramatically decreased when irrelevant information is included. We also identify several approaches for mitigating this deficiency, such as decoding with self-consistency and adding to the prompt an instruction that tells the language model to ignore the irrelevant information.
翻译:大型语言模型在各种自然语言处理任务上取得了令人印象深刻的成绩,但迄今为止,这些模型主要是在与解决任务相关的投入背景下的所有信息都相关的基准上进行评估。在这项工作中,我们调查大型语言模型的可转移性,即解决问题模型的准确性如何受到不相关背景的影响。特别是,我们引入了具有与不相关背景的高中数学(GSM-IC),这是一个算术推理数据集,其中含有问题描述中不相关的信息。我们使用这一基准来衡量大语言模型尖端快速技术的可转移性,发现在纳入不相关信息时模型性能会大大下降。我们还确定了减轻这一缺陷的若干办法,例如与自相矛盾的解码,并添加一项及时指示,指示语言模型忽略不相关信息。