文字和模式:要形成有效的思维链,需要二到探戈 (Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango)

from arxiv, 115 pages, 15 figures, and 84 tables. The authors contributed equally. Work done when Aman Madaan was a student researcher at Google Research, Brain Team

Reasoning is a key pillar of human cognition and intelligence. In the past decade, we witnessed dramatic gains in natural language processing and unprecedented scaling of large language models. Recent work has characterized the capability of few-shot prompting techniques such as chain of thought to emulate human reasoning in large language models. This hallmark feature of few-shot prompting, combined with ever scaling language models, opened a vista of possibilities to solve various tasks, such as math word problems, code completion, and commonsense reasoning. Chain of thought (CoT) prompting further pushes the performance of models in a few-shot setup, by supplying intermediate steps and urging the model to follow the same process. Despite its compelling performance, the genesis of reasoning capability in these models is less explored. This work initiates the preliminary steps towards a deeper understanding of reasoning mechanisms in large language models. Our work centers around querying the model while controlling for all but one of the components in a prompt: symbols, patterns, and text. We then analyze the performance divergence across the queries. Our results suggest the presence of factual patterns in a prompt is not necessary for the success of CoT. Nonetheless, we empirically show that relying solely on patterns is also insufficient for high quality results. We posit that text imbues patterns with commonsense knowledge and meaning. Our exhaustive empirical analysis provides qualitative examples of the symbiotic relationship between text and patterns. Such systematic understanding of CoT enables us to devise concise chain of thought, dubbed as CCoT, where text and patterns are pruned to only retain their key roles, while delivering on par or slightly higher solve task rate.

翻译：理性是人类认知和智慧的一个关键支柱。在过去的十年中,我们在自然语言处理和大规模语言模型的空前规模上取得了显著的成绩。最近的工作特征是一些微小的刺激技术的能力,如在大型语言模型中模仿人类推理的思维链。这个突出特征是微小的刺激,加上日益扩大的语言模型,打开了解决各种任务(如数学单词问题、代码完成和常识推理等)的可能性的轮廓。思维链(CoT)进一步微弱地推动模型的绩效,通过提供中间步骤和敦促模式遵循同样的过程。尽管这些模型具有令人信服的业绩,但这些模型的推理能力的起源没有那么深入地探索。这个特征是微小的推手势。我们的工作中心在对模型进行查询的同时,只对数学单词问题、代码完成和常识推理等要素之一进行快速控制,然后我们分析各种查询之间的性差。我们的结果表明,在CT的快速呈现事实模式是不需要的,而在CTO成功和质学关系中,我们的经验模型也只是根据共同的层次分析。我们的经验模式,我们只是依靠C的理论分析。