Chain-of-thought prompting (CoT) advances the reasoning abilities of large language models (LLMs) and achieves superior performance in arithmetic, commonsense, and symbolic reasoning tasks. However, most CoT studies rely on carefully designed human-annotated rational chains to prompt the language model, which poses challenges for real-world applications where labeled training data is available without human-annotated rational chains. This creates barriers to applications of CoT prompting to these general tasks. This paper proposes a new strategy, Automate-CoT (Automatic Prompt Augmentation and Selection with Chain-of-Thought), that can bypass human engineering of CoTs by automatically augmenting rational chains from a small labeled dataset, and then pruning low-quality chains to construct a candidate pool of machine-generated rationale chains based on the labels. Finally, it selects the optimal combination of several rationale chains from the pool for CoT prompting by employing a variance-reduced policy gradient strategy to estimate the significance of each example in a black-box language model. Automate-CoT enables a quick adaptation of the CoT technique to different tasks. Experimental results demonstrate the effectiveness of our method, where state-of-the-art results are achieved on arithmetic reasoning (+2.7\%), commonsense reasoning (+3.4\%), symbolic reasoning (+3.2\%), and non-reasoning tasks (+2.5\%). Our code will be available at https://github.com/shizhediao/automate-cot.
翻译:思考链促进(CoT)提高了大型语言模型(LLMS)的推理能力,并实现了计算、常识和象征性推理任务的优异性能。然而,大多数COT研究都依赖精心设计的人文附加说明的合理链链,以促使语言模型,这给没有人注解的合理链提供标签培训数据的现实世界应用程序带来了挑战。这为应用CoT促进这些一般性任务设置了障碍。本文件提出了一个新的战略,即Autalmate-CoT(自动快速快速调试和与Thought链的筛选),它可以通过从一个小标签数据集自动增强理性链,绕过COTs的人文工程,然后将低质量链用于建立基于标签的机器生成原理链候选库。最后,它选择了从CoT库中推出的若干理由链的最佳组合,通过采用差异调整的政策梯度战略来估计黑框语言模型中每个示例的重要性。Automate-CoT能够快速调整COT技术的理性链链,从一个小标签数据集/Q_Qalimalalalalalal-alalalalal-hisal-hal-hal-hislational-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hisal-hal-hal-hal-hisal-hal-hal-hal-hisal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-hal-</s>