In this paper, we propose a comprehensive benchmark to investigate models' logical reasoning capabilities in complex real-life scenarios. Current explanation datasets often employ synthetic data with simple reasoning structures. Therefore, it cannot express more complex reasoning processes, such as the rebuttal to a reasoning step and the degree of certainty of the evidence. To this end, we propose a comprehensive logical reasoning explanation form. Based on the multi-hop chain of reasoning, the explanation form includes three main components: (1) The condition of rebuttal that the reasoning node can be challenged; (2) Logical formulae that uncover the internal texture of reasoning nodes; (3) Reasoning strength indicated by degrees of certainty. The fine-grained structure conforms to the real logical reasoning scenario, better fitting the human cognitive process but, simultaneously, is more challenging for the current models. We evaluate the current best models' performance on this new explanation form. The experimental results show that generating reasoning graphs remains a challenging task for current models, even with the help of giant pre-trained language models.
翻译:在本文中,我们提出了一个全面的基准,以调查各种模型在复杂现实情景中的逻辑推理能力。目前的解释数据集经常使用具有简单推理结构的合成数据。因此,它不能表达更复杂的推理过程,例如对推理步骤的反驳和证据的确定性程度。为此,我们提出一个全面的逻辑推理解释表。根据多动推理链,解释表包括三个主要组成部分:(1) 反驳条件,即推理节点可以受到质疑;(2) 逻辑公式,揭示推理节点的内部纹理;(3) 判断力的确定度。精细结构符合真实的逻辑推理假设,更好地适应人类认知过程,但同时也对目前的模型更具挑战性。我们评估目前最佳模型在这一新解释表上的绩效。实验结果显示,生成推理图对当前模型来说仍然是一项艰巨的任务,即使是在经过预先培训的巨型语言模型的帮助下。