Chain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks. However, it tends to perform poorly on tasks which requires solving problems harder than the exemplars shown in the prompts. To overcome this challenge of easy-to-hard generalization, we propose a novel prompting strategy, least-to-most prompting. The key idea in this strategy is to break down a complex problem into a series of simpler subproblems and then solve them in sequence. Solving each subproblem is facilitated by the answers to previously solved subproblems. Our experimental results on tasks related to symbolic manipulation, compositional generalization, and math reasoning reveal that least-to-most prompting is capable of generalizing to more difficult problems than those seen in the prompts. A notable finding is that when the GPT-3 code-davinci-002 model is used with least-to-most prompting, it can solve the compositional generalization benchmark SCAN in any split (including length split) with an accuracy of at least 99% using just 14 exemplars, compared to only 16% accuracy with chain-of-thought prompting. This is particularly noteworthy because neural-symbolic models in the literature that specialize in solving SCAN are trained on the entire training set containing over 15,000 examples. We have included prompts for all the tasks in the Appendix.
翻译:链式提示已经在各种自然语言推理任务中展现出卓越的表现。然而,它倾向于在需要解决比提示中示例更难的问题时表现糟糕。为了克服easy-to-hard泛化的挑战,我们提出了一种新的提示策略:最小-最大提示策略。该策略的关键思路是将一个复杂问题分解成一系列较简单的子问题,然后按顺序解决它们。每个子问题的解决都会受到以前解决的子问题答案的帮助。我们在涉及符号操作,组合推理和数学推理的任务上的实验结果表明,最小-最大提示是能够泛化到比提示中看到的更困难的问题。一个值得注意的发现是,当使用GPT-3 code-davinci-002模型和最小-最大提示时,仅使用14个示例就能解决组合泛化基准测试SCAN的任何划分(包括长度划分),而链式提示仅能达到16%的准确率。这尤其值得注意,因为文献中的神经符号模型是针对整个训练集中的超过15,000个示例进行训练的。附录中提供了所有任务的提示。