Probing strategies have been shown to detect the presence of various linguistic features in large language models; in particular, semantic features intermediate to the "natural logic" fragment of the Natural Language Inference task (NLI). In the case of natural logic, the relation between the intermediate features and the entailment label is explicitly known: as such, this provides a ripe setting for interventional studies on the NLI models' representations, allowing for stronger causal conjectures and a deeper critical analysis of interventional probing methods. In this work, we carry out new and existing representation-level interventions to investigate the effect of these semantic features on NLI classification: we perform amnesic probing (which removes features as directed by learned linear probes) and introduce the mnestic probing variation (which forgets all dimensions except the probe-selected ones). Furthermore, we delve into the limitations of these methods and outline some pitfalls have been obscuring the effectivity of interventional probing studies.
翻译:探测策略已被证明在大型语言模型中能够检测到各种语言特征,特别是自然语言推理任务(NLI)的自然逻辑碎片。在自然逻辑中,中间特征与蕴含标签之间的关系明确:因此,这提供了一种成熟的环境,可以进行NLI模型表示的干预研究,从而提供更强的因果推断和更深入的干预探测方法的批判性分析。在这项工作中,我们进行了新的和现有的表示级干预,以研究这些语义特征对NLI分类的影响:我们执行了遗忘性探测(根据学习的线性探测器删除特征)并引入了遗忘性探测变化(除了探测器选择的维度,遗忘所有维度)。此外,我们深入探讨了这些方法的局限性,并概述了一些误导因素,这些因素一直使干预探测研究的有效性变得模糊不清。