We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.
翻译:我们探索如何产生一连串的思维 -- -- 一系列中间推理步骤 -- -- 大大提高了大型语言模型进行复杂推理的能力。特别是,我们展示了这种推理能力如何通过被称为“推理链”的简单方法在足够大的语言模型中自然地产生,该方法被称为“推理链”的推理能力,其中提供了几连串的思想演示作为推理的示例。对三大语言模型的实验表明,促使推理链提高了一系列算术、常识和象征性推理任务的性能。经验收益可能是惊人的。例如,促成一个540B参数语言模型,只有8个集思广益器,在GSM8K数学词问题基准上实现了艺术准确性,甚至比GPT-3和校准器更精准。