While Chain-of-Thought (CoT) prompting boosts Language Models' (LM) performance on a gamut of complex reasoning tasks, the generated reasoning chain does not necessarily reflect how the model arrives at the answer (aka. faithfulness). We propose Faithful CoT, a faithful-by-construction framework that decomposes a reasoning task into two stages: Translation (Natural Language query $\rightarrow$ symbolic reasoning chain) and Problem Solving (reasoning chain $\rightarrow$ answer), using an LM and a deterministic solver respectively. We demonstrate the efficacy of our approach on 10 reasoning datasets from 4 diverse domains. It outperforms traditional CoT prompting on 9 out of the 10 datasets, with an average accuracy gain of 4.4 on Math Word Problems, 1.9 on Planning, 4.0 on Multi-hop Question Answering (QA), and 18.1 on Logical Inference, under greedy decoding. Together with self-consistency decoding, we achieve new state-of-the-art few-shot performance on 7 out of the 10 datasets, showing a strong synergy between faithfulness and accuracy.
翻译:虽然连锁推理(CoT)推动语言模型(LM)在一系列复杂推理任务上的表现,生成的推理链不一定反映模型是如何找到答案的(aa. 忠诚。 我们提出忠实的CT,这是一个忠实的逐条框架,将推理任务分解成两个阶段:翻译(自然语言查询,$\rightrow$象征性推理链)和问题解析(逻辑链 $\rightrowrow$ 回答),分别使用一个LM和一个确定性解析器。我们展示了我们对来自4个不同领域的10个推理数据集的方法的功效。它比10个数据集中的9个显示传统COT高,平均精度收益为:数学单词问题4.4,规划1.9,多动答题回答(QA)4.0,逻辑推理,贪婪解码不足。再加上自我解码,我们在10个数据集中的7个显示精确度和精确度之间有很强的协同作用。