Acquiring factual knowledge with Pretrained Language Models (PLMs) has attracted increasing attention, showing promising performance in many knowledge-intensive tasks. Their good performance has led the community to believe that the models do possess a modicum of reasoning competence rather than merely memorising the knowledge. In this paper, we conduct a comprehensive evaluation of the learnable deductive (also known as explicit) reasoning capability of PLMs. Through a series of controlled experiments, we posit two main findings. (i) PLMs inadequately generalise learned logic rules and perform inconsistently against simple adversarial surface form edits. (ii) While the deductive reasoning fine-tuning of PLMs does improve their performance on reasoning over unseen knowledge facts, it results in catastrophically forgetting the previously learnt knowledge. Our main results suggest that PLMs cannot yet perform reliable deductive reasoning, demonstrating the importance of controlled examinations and probing of PLMs' reasoning abilities; we reach beyond (misleading) task performance, revealing that PLMs are still far from human-level reasoning capabilities, even for simple deductive tasks.
翻译:通过一系列受控实验,我们得出了两项主要结论:(一) PLMs没有足够概括的逻辑学规则,对简单的对抗性表面形式编辑进行不协调的推理能力,表明PLMs仍然远离人类层面的推理能力,即使是简单的推理任务。 (二) 虽然对PLMs的推理精细调整确实改善了其在关于隐蔽知识事实的推理方面的性能,但结果却是灾难性地忘记了先前所学的知识。我们的主要结果表明,PLMs尚不能进行可靠的推理推理,表明控制性检查和探索PLMs推理能力的重要性;我们超越(错误的)任务性能,表明PLMs仍然远离人类层面的推理能力,即使是简单的推理能力。