Neural networks are known to be biased towards learning mechanisms that help identify $spurious\, attributes$, yielding features that do not generalize well under distribution shifts. To understand and address this limitation, we study the geometry of neural network loss landscapes through the lens of $mode\, connectivity$, the observation that minimizers of neural networks are connected via simple paths of low loss. Our work addresses two questions: (i) do minimizers that encode dissimilar mechanisms connect via simple paths of low loss? (ii) can fine-tuning a pretrained model help switch between such minimizers? We define a notion of $\textit{mechanistic similarity}$ and demonstrate that lack of linear connectivity between two minimizers implies the corresponding models use dissimilar mechanisms for making their predictions. This property helps us demonstrate that na$\"{i}$ve fine-tuning can fail to eliminate a model's reliance on spurious attributes. We thus propose a method for altering a model's mechanisms, named $connectivity$-$based$ $fine$-$tuning$, and validate its usefulness by inducing models invariant to spurious attributes.
翻译:已知神经网络偏向于有助于识别美元纯度的学习机制, 属性$, 产生在分布式转换中并不十分普遍的特点。 为了理解和解决这一限制, 我们通过 $mode\, 连接$, 观察神经网络最小化者通过简单的低损失路径连接。 我们的工作解决了两个问题:(一) 将不同机制通过简单的低损失路径连接起来的最小化器进行编码? (二) 精细调整一个经过训练的模式可以帮助在这种最小化器之间转换? 我们定义了美元(textit{ 机械相似性 ) 的概念, 并证明两个最小化器之间缺乏线性连接意味着相应的模型使用不同机制来作出预测。 这个属性有助于我们证明, na$(i) $(t) 微调能无法消除模型对虚假属性的依赖。 因此, 我们提议了一种方法来改变模型机制, 名为 美元- 美元基基调$- 美元 美元, 并验证其效用, 将模型推导出具有欺骗性属性的模型。