Transformers have been shown to be able to perform deductive reasoning on a logical rulebase containing rules and statements written in English natural language. While the progress is promising, it is currently unclear if these models indeed perform logical reasoning by understanding the underlying logical semantics in the language. To this end, we propose RobustLR, a suite of evaluation datasets that evaluate the robustness of these models to minimal logical edits in rulebases and some standard logical equivalence conditions. In our experiments with RoBERTa and T5, we find that the models trained in prior works do not perform consistently on the different perturbations in RobustLR, thus showing that the models are not robust to the proposed logical perturbations. Further, we find that the models find it especially hard to learn logical negation and disjunction operators. Overall, using our evaluation sets, we demonstrate some shortcomings of the deductive reasoning-based language models, which can eventually help towards designing better models for logical reasoning over natural language. All the datasets and code base have been made publicly available.
翻译:事实证明,变换者能够对含有以英文自然语言撰写的规则和声明的逻辑规则基础进行推理推理。 虽然进步是大有希望的,但目前尚不清楚这些模型是否确实通过理解语言中基本的逻辑语义来进行逻辑推理。为此,我们提议了一套评估模型的稳健性的评价数据集RobustLR,这是一套评估这些模型的稳健性,以尽量减少规则基础和一些标准逻辑等同条件中的逻辑编辑。在与RoBERTA和T5的实验中,我们发现以前工作所培训的模型在RobustLR的不同扰动上没有一致地运行,从而表明这些模型对拟议的逻辑扰动不健全。此外,我们发现这些模型特别难于学习逻辑否定和脱钩操作器。总的来说,我们用我们的评估数据集,展示了基于推理语言模型的一些缺点,这些缺点最终可以帮助设计更好的自然语言逻辑推理模型。所有数据集和代码基础都已公开。