Neural language models exhibit impressive performance on a variety of tasks, but their internal reasoning may be difficult to understand. Prior art aims to uncover meaningful properties within model representations via probes, but it is unclear how faithfully such probes portray information that the models actually use. To overcome such limitations, we propose a technique, inspired by causal analysis, for generating counterfactual embeddings within models. In experiments testing our technique, we produce evidence that suggests some BERT-based models use a tree-distance-like representation of syntax in downstream prediction tasks.
翻译:神经语言模型在各种任务上表现出令人印象深刻的表现,但其内部推理可能难以理解。 先前的艺术目的是通过探测器在模型展示中发现有意义的属性,但尚不清楚这些探测器如何忠实地描绘模型实际使用的信息。 为了克服这些局限性,我们提出一种技术,在因果分析的启发下,在模型中产生反事实嵌入。在实验中,我们提出了一些证据,表明一些基于生物、生物、生物和毒素技术的模型在下游预测任务中使用类似于树距离的语法。