State-of-the-art deep-learning-based approaches to Natural Language Processing (NLP) are credited with various capabilities that involve reasoning with natural language texts. In this paper we carry out a large-scale empirical study investigating the detection of formally valid inferences in controlled fragments of natural language for which the satisfiability problem becomes increasingly complex. We find that, while transformer-based language models perform surprisingly well in these scenarios, a deeper analysis re-veals that they appear to overfit to superficial patterns in the data rather than acquiring the logical principles governing the reasoning in these fragments.
翻译:在本文中,我们进行了大规模的实证研究,调查在自然语言受控碎片中发现正式有效的推论,这些自然语言的可坐性问题日益复杂。我们发现,虽然基于变压器的语言模型在这些假设中表现得令人惊讶,但更深层次的分析重新拍摄似乎过分适应了数据中的表面模式,而不是获得指导这些碎片推理的逻辑原则。