In this article, we explore the shallow heuristics used by transformer-based pre-trained language models (PLMs) that are fine-tuned for natural language inference (NLI). To do so, we construct or own dataset based on syllogistic, and we evaluate a number of models' performance on our dataset. We find evidence that the models rely heavily on certain shallow heuristics, picking up on symmetries and asymmetries between premise and hypothesis. We suggest that the lack of generalization observable in our study, which is becoming a topic of lively debate in the field, means that the PLMs are currently not learning NLI, but rather spurious heuristics.
翻译:在文章中,我们探索了基于变压器的预先培训的语言模型(PLM)使用的浅重力学,这些模型对自然语言推论进行了精细的调整。为了做到这一点,我们根据数学构建或拥有数据集,我们评估了我们数据集上的一些模型的性能。我们发现有证据表明,这些模型严重依赖某些浅重力学,发现了前提和假设之间的对称和不对称。我们指出,我们的研究中缺乏普遍化,这正在成为该领域活跃辩论的主题,意味着PLM目前没有学习NLI,而是学习虚假的超重力学。