Current pre-trained language models have enabled remarkable improvements in downstream tasks, but it remains difficult to distinguish effects of statistical correlation from more systematic logical reasoning grounded on understanding of the real world. In this paper we tease these factors apart by leveraging counterfactual conditionals, which force language models to predict unusual consequences based on hypothetical propositions. We introduce a set of tests drawn from psycholinguistic experiments, as well as larger-scale controlled datasets, to probe counterfactual predictions from a variety of popular pre-trained language models. We find that models are consistently able to override real-world knowledge in counterfactual scenarios, and that this effect is more robust in case of stronger baseline world knowledge -- however, we also find that for most models this effect appears largely to be driven by simple lexical cues. When we mitigate effects of both world knowledge and lexical cues to test knowledge of linguistic nuances of counterfactuals, we find that only GPT-3 shows sensitivity to these nuances, though this sensitivity is also non-trivially impacted by lexical associative factors.
翻译:目前经过培训的语文模式使得下游任务有了显著的改进,但是仍然难以区分统计相关性的影响和基于对现实世界的理解的更系统化逻辑推理。在本文中,我们通过利用反事实条件来挑剔这些因素,这些条件迫使语言模型根据假设假设预测不寻常的后果。我们引入了一套从心理语言实验以及规模更大的受控数据集中抽取的测试,以探从各种广受欢迎的经过培训的语言模式中得出的反事实预测。我们发现,在反事实假设中,这些模型始终能够超越现实世界知识,而这种影响在更强大的基线世界知识中更为强劲 -- -- 然而,我们也发现,对于大多数模型来说,这种效果似乎在很大程度上是由简单的词汇提示驱动的。当我们减轻世界知识和词汇提示对反事实语言细微的了解时,我们发现只有GPT-3能够显示对这些细微现象的敏感性,尽管这种敏感性也不受词汇联系因素的影响。