Although chain-of-thought prompting has shown impressive results on many natural language reasoning tasks, it often performs poorly on tasks which need to solve problems harder than the demonstration examples. To tackle such easy-to-hard generalization issues, we propose a novel prompting strategy, least-to-most prompting. It reduces a complex problem into a list of subproblems, and then sequentially solve these subproblems, whereby solving a given subproblem is facilitated by the answers to previously solved subproblems. Experiments on symbolic manipulation, compositional generalization and math reasoning show that least-to-most prompting can generalize to the examples that are harder than those seen in the prompt, and outperform chain-of-thought prompting by a large margin. A notable result is that the GPT-3 code-davinci-002 model with least-to-most-prompting solves the SCAN benchmark regardless of splits (such as length split) with an accuracy of 99.7% using 14 examples versus an accuracy of 16.2% by chain-of-thought prompting, and neural-symbolic models in the literature specialized for solving SCAN are trained with the full training set of more than 15,000 examples.
翻译:尽管一连串的思考激励在许多自然语言推理任务上显示出了令人印象深刻的成果,但在需要解决比示范实例更难解决的问题的任务上,它往往表现不力。为了解决这种简单到硬的概括问题,我们提议了一个创新的快速战略,最小到最能推动。它将一个复杂的问题降为子问题清单,然后依次解决这些子问题,从而通过对以前解决的子问题的答案来帮助解决一个特定的子问题。关于象征性操纵、组成一般化和数学推理的实验表明,最不迅速的实验可以概括到比在快速中看到的更困难的例子,而且超越了以大幅度推动的思维链。一个显著的结果是,GPT-3 代码-davini-002模型以最难解决的分解(如长度分割)方法来解决SCAN基准,精确度为99.7%,使用14个实例,而精确度为16.2%,通过思维快速的链式和神经-直截面的模型比经过培训的15年专业模型的文献中经过充分培训的15岁。