Deductive reasoning over natural language is a challenging problem in NLP. In this work, we focus on proof generation: Given a hypothesis and a set of supporting facts, the model generates a proof tree indicating how to deduce the hypothesis from supporting facts. Compared to generating the entire proof in one shot, stepwise generation can better exploit the compositionality and generalize to longer proofs but has achieved limited success on real-world data. Existing stepwise methods struggle to generate proof steps that are both logically valid and relevant to the hypothesis. Instead, they tend to hallucinate invalid steps given the hypothesis. In this paper, we present a novel stepwise method, NLProofS (Natural Language Proof Search), which learns to generate relevant steps conditioning on the hypothesis. At the core of our approach, we train an independent verifier to check the validity of the proof steps to prevent hallucination. Instead of generating steps greedily, we search for proofs maximizing a global proof score judged by the verifier. NLProofS achieves state-of-the-art performance on EntailmentBank and RuleTaker. Specifically, it improves the correctness of predicted proofs from 27.7% to 33.3% in the distractor setting of EntailmentBank, demonstrating the effectiveness of NLProofS in generating challenging human-authored proofs.
翻译:自然语言的隐含推理是NLP中一个具有挑战性的问题。 在这项工作中,我们侧重于证据生成:根据假设和一系列支持事实,模型产生一个证明树,表明如何从支持事实推断出假设。与以一针制成整个证据相比,分步制的一代可以更好地利用组成性,概括到更长的证明,但在真实世界数据上却取得了有限的成功。现有的渐进式方法在努力生成证据步骤,这些步骤在逻辑上是正当的,与假设相关。相反,它们倾向于在假设中出现幻想无效的无效步骤。在本文中,我们提出了一个新的分步方法,即NLProofS(自然语言校准搜索),该方法学习如何在假设上产生相关步骤。在我们的方法中,我们训练一个独立的核查员,检查证据步骤的有效性,以防止幻觉。我们不是贪婪地制造步骤,而是寻找证据,以最大限度地达到由验证者判断的全球证据分数。相反,NLProofS(NL)在Etailment Bank和规则作者身上实现最新表现。具体地,它提高了对Ban3的准确性证据的可靠性,从27-regrealformaint。