The invariance principle from causality is at the heart of notable approaches such as invariant risk minimization (IRM) that seek to address out-of-distribution (OOD) generalization failures. Despite the promising theory, invariance principle-based approaches fail in common classification tasks, where invariant (causal) features capture all the information about the label. Are these failures due to the methods failing to capture the invariance? Or is the invariance principle itself insufficient? To answer these questions, we revisit the fundamental assumptions in linear regression tasks, where invariance-based approaches were shown to provably generalize OOD. In contrast to the linear regression tasks, we show that for linear classification tasks we need much stronger restrictions on the distribution shifts, or otherwise OOD generalization is impossible. Furthermore, even with appropriate restrictions on distribution shifts in place, we show that the invariance principle alone is insufficient. We prove that a form of the information bottleneck constraint along with invariance helps address key failures when invariant features capture all the information about the label and also retains the existing success when they do not. We propose an approach that incorporates both of these principles and demonstrate its effectiveness in several experiments.
翻译:因果关系的偏差原则是一些值得注意的方法的核心,例如试图解决分配外(OOD)一般化失败的不易风险最小化(IRM),这些方法试图解决分配外(OOD)一般化失败的问题。尽管理论很有希望,但基于原则的不易做法在共同分类任务中却失败,因为其中不易(causal)特征能捕捉关于标签的所有信息。这些失败是因为方法未能捕捉差异性(causal)造成的吗?还是因为偏差原则本身不够充分?为了回答这些问题,我们重新审视线性回归任务中的基本假设,其中显示基于差异的方法可以使OOOOD普遍化。与线性回归任务相反,我们表明对于线性分类任务,我们需要对分配转移实行更严格的限制,否则OOODD一般化是不可能的。此外,即使对分配上的变化作了适当的限制,我们也表明,单因方法的偏差原则本身是不够的。我们证明,一种信息瓶颈的制约形式加上不易性制约有助于解决关键的失败问题,当差异特性能捕捉到关于标签的所有信息的信息,并且保留现有的成功经验。我们提议一种方法,同时将这些原则纳入这些原则。