Pretrained large language models (LLMs) are widely used in many sub-fields of natural language processing (NLP) and generally known as excellent few-shot learners with task-specific exemplars. Notably, chain of thought (CoT) prompting, a recent technique for eliciting complex multi-step reasoning through step-by-step answer examples, achieved the state-of-the-art performances in arithmetics and symbolic reasoning, difficult system-2 tasks that do not follow the standard scaling laws for LLMs. While these successes are often attributed to LLMs' ability for few-shot learning, we show that LLMs are decent zero-shot reasoners by simply adding "Let's think step by step" before each answer. Experimental results demonstrate that our Zero-shot-CoT, using the same single prompt template, significantly outperforms zero-shot LLM performances on diverse benchmark reasoning tasks including arithmetics (MultiArith, GSM8K, AQUA-RAT, SVAMP), symbolic reasoning (Last Letter, Coin Flip), and other logical reasoning tasks (Date Understanding, Tracking Shuffled Objects), without any hand-crafted few-shot examples, e.g. increasing the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with 175B parameter InstructGPT model, as well as similar magnitudes of improvements with another off-the-shelf large model, 540B parameter PaLM. The versatility of this single prompt across very diverse reasoning tasks hints at untapped and understudied fundamental zero-shot capabilities of LLMs, suggesting high-level, multi-task broad cognitive capabilities may be extracted by simple prompting. We hope our work not only serves as the minimal strongest zero-shot baseline for the challenging reasoning benchmarks, but also highlights the importance of carefully exploring and analyzing the enormous zero-shot knowledge hidden inside LLMs before crafting finetuning datasets or few-shot exemplars.
翻译:在自然语言处理(NLP)的许多子领域广泛使用预先掌握的大型语言模型(LLMS),这些成功通常归功于LLMS的多级缩略图学习能力,而我们则表明,LLMS是一流的一流的零发学生。值得注意的是,思维链(CoT)催化了,这是最近通过一步步回答实例来引来复杂的多步推理的技术,在计算和象征性推理方面实现了最先进的成绩,在系统2任务中,没有遵循LLMS的标准缩略图。虽然这些成功往往归功于LLMs在微小的学习能力,但是我们显示LLMSD是一流的优秀零发学生,只是通过在每个答案之前添加“让我们一步一步思考”的简单零发学生。实验结果表明,我们的Zeroshot-CootT, 使用同一单一的速模版,大大超出数学的零发LMLM的成绩, 包括计算(MultiArentAth, GS8K,但A-RAT,S-RAT, Slent-listral-LMT) liental-listral-listral-listal-listral-listral-list list listal-list lishing lishing lishing ex list ex ex ex ex ex ex ex ex ex ex ex dirvaldentaldents) exfulatedents mactslatedentaldents ex, ex, ex ex ex ex ex ex ex exfacts mess exfacts exs exs exs ex exfraldaldalds lax ex ex exs exs ex ex lads macts lautaldaldaldaldaldalds lads lauts lax ex lax ex ex ex ex ex lax lax lax ex ex ex lax ex ex lax ex ex ex