When language models process syntactically complex sentences, do they use abstract syntactic information present in these sentences in a manner that is consistent with the grammar of English, or do they rely solely on a set of heuristics? We propose a method to tackle this question, AlterRep. For any linguistic feature in the sentence, AlterRep allows us to generate counterfactual representations by altering how this feature is encoded, while leaving all other aspects of the original representation intact. Then, by measuring the change in a models' word prediction with these counterfactual representations in different sentences, we can draw causal conclusions about the contexts in which the model uses the linguistic feature (if any). Applying this method to study how BERT uses relative clause (RC) span information, we found that BERT uses information about RC spans during agreement prediction using the linguistically correct strategy. We also found that counterfactual representations generated for a specific RC subtype influenced the number prediction in sentences with other RC subtypes, suggesting that information about RC boundaries was encoded abstractly in BERT's representation.
翻译:当语言模型处理综合复杂的句子时,它们是否以与英语语法一致的方式使用这些句子中的抽象综合信息,或者它们是否完全依赖一套超自然学?我们提出了解决这一问题的方法,AlterRep。对于句子中的任何语言特征,AlterRep允许我们通过改变该特征如何编码来产生反事实表现,同时保留最初表述的所有其它方面。然后,通过用不同句子中的这些反事实表述来衡量模型词性预测的变化,我们可以就模型使用语言特征(如果有的话)的背景得出因果关系结论。运用这一方法来研究生物和地球伦理学研究小组如何使用相对条款(RC)覆盖信息,我们发现在使用语言正确战略进行协议预测期间,生物和伦理学小组使用了有关RC范围的信息。我们还发现,为特定RC子类型产生的反事实表现影响了与其他RC子类型在句中的数值预测,表明有关RC边界的信息在生物和伦理学专家专家小组的表述中是抽象编码的。