Chain-of-Thought (CoT) prompting can dramatically improve the multi-step reasoning abilities of large language models (LLMs). CoT explicitly encourages the LLM to generate intermediate rationales for solving a problem, by providing a series of reasoning steps in the demonstrations. Despite its success, there is still little understanding of what makes CoT prompting effective and which aspects of the demonstrated reasoning steps contribute to its performance. In this paper, we show that CoT reasoning is possible even with invalid demonstrations - prompting with invalid reasoning steps can achieve over 80-90% of the performance obtained using CoT under various metrics, while still generating coherent lines of reasoning during inference. Further experiments show that other aspects of the rationales, such as being relevant to the query and correctly ordering the reasoning steps, are much more important for effective CoT reasoning. Overall, these findings both deepen our understanding of CoT prompting, and open up new questions regarding LLMs' capability to learn to reason in context.
翻译:在演示中提供一系列推理步骤,从而明确鼓励LLM产生解决问题的中间理由。尽管取得了成功,但对于CoT如何促使有效以及证明的推理步骤的哪些方面有助于其表现,仍然缺乏了解。在本文件中,我们表明,即使演示无效,CoT的推理也是可能的。 以无效推理步骤推动,可以达到使用CoT在各种尺度下取得的业绩的80-90%以上,同时在推理过程中仍然产生一致的推理线。进一步的实验表明,理由的其他方面,例如与查询有关和正确命令推理步骤,对于COT的有效推理更为重要。总体而言,这些结论加深了我们对CoT的推理理解,并就LLMs在背景中学习理性的能力提出了新的问题。