Understanding natural language requires common sense, one aspect of which is the ability to discern the plausibility of events. While distributional models -- most recently pre-trained, Transformer language models -- have demonstrated improvements in modeling event plausibility, their performance still falls short of humans'. In this work, we show that Transformer-based plausibility models are markedly inconsistent across the conceptual classes of a lexical hierarchy, inferring that "a person breathing" is plausible while "a dentist breathing" is not, for example. We find this inconsistency persists even when models are softly injected with lexical knowledge, and we present a simple post-hoc method of forcing model consistency that improves correlation with human plausibility judgements.
翻译:理解自然语言需要常识, 其一个方面是能够辨别事件的可信度。 分布模型 -- -- 最近经过训练的变换语言模型 -- -- 已经表明在模拟事件可信度方面有所改进,但其性能仍然低于人。 在这项工作中,我们显示,基于变异器的可视性模型在法系等级的概念类别中明显不一致, 推断“ 人呼吸”是可信的, 而“ 牙医呼吸”则不是。我们发现,即使在模型以词汇学知识进行软注入时,这种不一致依然存在, 我们提出了一种简单的事后强制模式一致性方法, 这种方法可以改善与人类判断的相关性。