Transformer-based language models have been shown to be highly effective for several NLP tasks. In this paper, we consider three transformer models, BERT, RoBERTa, and XLNet, in both small and large version, and investigate how faithful their representations are with respect to the semantic content of texts. We formalize a notion of semantic faithfulness, in which the semantic content of a text should causally figure in a model's inferences in question answering. We then test this notion by observing a model's behavior on answering questions about a story after performing two novel semantic interventions -- deletion intervention and negation intervention. While transformer models achieve high performance on standard question answering tasks, we show that they fail to be semantically faithful once we perform these interventions for a significant number of cases (~50% for deletion intervention, and ~20% drop in accuracy for negation intervention). We then propose an intervention-based training regime that can mitigate the undesirable effects for deletion intervention by a significant margin (from ~50% to ~6%). We analyze the inner-workings of the models to better understand the effectiveness of intervention-based training for deletion intervention. But we show that this training does not attenuate other aspects of semantic unfaithfulness such as the models' inability to deal with negation intervention or to capture the predicate-argument structure of texts. We also test InstructGPT, via prompting, for its ability to handle the two interventions and to capture predicate-argument structure. While InstructGPT models do achieve very high performance on predicate-argument structure task, they fail to respond adequately to our deletion and negation interventions.
翻译:以变换器为基础的语言模型已证明对一些 NLP 任务非常有效。 在本文中, 我们考虑三个变压器模型, 包括小版和大版的BERT、 RoBERTA 和 XLNet, 并调查它们的陈述对文本的语义内容的忠实程度。 我们正式确定一个语义忠诚的概念, 其中文本的语义内容应该因果地出现在模型的推断解答中。 然后我们通过观察一个模型在回答两个新颖的语义干预( 删除干预和否定干预)之后回答关于一个故事的直率行为。 虽然变压器模型在标准回答回答任务上表现得很高, 但是我们发现, 一旦我们为大量案件( ~ 50% 用于删除干预, ~ 20% 准确性地显示文本的精度 ), 然后我们提出一个基于干预的训练制度, 来减轻以显著幅度( 从~ 50% 到~ 6 % ) 来删除一个故事。 我们分析模型的内部工作模式的精度, 也显示这种精度结构的精度 的精度 。