Large-scale pre-trained language models (PLMs) bring new opportunities to challenge problems, especially those that need high-level intelligence, such as the math word problem (MWPs). However, directly applying existing PLMs to MWPs can fail as the generation process lacks sufficient supervision and thus lacks fast adaptivity as humans. We notice that human reasoning has a dual reasoning framework that consists of an immediate reaction system (system 1) and a delicate reasoning system (system 2), where the entire reasoning is determined by their interaction. This inspires us to develop a cooperative reasoning-induced PLM for solving MWPs, called Cooperative Reasoning (CoRe), resulting in a human-like reasoning architecture with system 1 as the generator and system 2 as the verifier. In our approach, the generator is responsible for generating reasoning paths, and the verifiers are used to supervise the evaluation in order to obtain reliable feedback for the generator. We evaluate our CoRe framework on several mathematical reasoning datasets and achieve decent improvement over state-of-the-art methods, up to 9.8% increase over best baselines.
翻译:大规模预先培训的语言模型(PLMs)带来了挑战问题的新机会,特别是那些需要高层次情报的问题,如数学字词问题(MWPs)等。然而,直接将现有的PLMs直接应用到MWP(MWP)可能会失败,因为生成过程缺乏足够的监督,因此人与人之间缺乏快速适应性。我们注意到,人类推理有一个双重推理框架,由即时反应系统(系统1)和微妙的推理系统(系统2)组成,整个推理系统由它们的互动决定。这激励我们开发出一种合作推理驱动的PLM(PLM)解决MWP(称为合作推理(CoRe ), 导致一种人与人相似的推理结构, 系统1是生成者,系统2是核查者。在我们的方法中,产生推理路径是负责,核查者用来监督评估,以便获得对发电机的可靠反馈。我们关于数个数学推理数据集的CORE框架,并实现与最新方法的体面的改进,比最佳基线高出9.8%。