Machine learning models often generalize poorly to out-of-distribution (OOD) data as a result of relying on features that are spuriously correlated with the label during training. Recently, the technique of Invariant Risk Minimization (IRM) was proposed to learn predictors that only use invariant features by conserving the feature-conditioned class expectation $\mathbb{E}_e[y|f(x)]$ across environments. However, more recent studies have demonstrated that IRM can fail in various task settings. Here, we identify a fundamental flaw of IRM formulation that causes the failure. We then introduce a complementary notion of invariance, MRI, that is based on conserving the class-conditioned feature expectation $\mathbb{E}_e[f(x)|y]$ across environments, that corrects for the flaw in IRM. Further, we introduce a simplified, practical version of the MRI formulation called as MRI-v1. We note that this constraint is convex which confers it with an advantage over the practical version of IRM, IRM-v1, which imposes non-convex constraints. We prove that in a general linear problem setting, MRI-v1 can guarantee invariant predictors given sufficient environments. We also empirically demonstrate that MRI strongly out-performs IRM and consistently achieves near-optimal OOD generalization in image-based nonlinear problems.
翻译:由于依赖与培训期间的标签有虚假关联的特征,机器学习模式往往不全面,以致数据无法分发(OOD)。最近,有人提议采用差异风险最小化技术(IRM)来学习预测者,这些预测者只能使用差异性特征,通过保存特制等级期望$\mathbb{E ⁇ e{E ⁇ e[y ⁇ f(x)]在各种环境中保持差异值。然而,最近的研究表明,IMR在各种任务环境下可能失败。在这里,我们发现IMM的配方存在一个根本缺陷,导致失败。我们随后引入了一个差异性、MRI的互补概念,其基础是保存按等级设定的特性预期$\mathb{E ⁇ e[f(x) ⁇ y]美元,从而纠正IMMR的缺陷。此外,我们引入了一个简化的、实用的MRI配方版本。我们注意到,这种制约使IMM的实用版本具有优势,IMR-v1,它给不兼容性1带来非兼容性约束。我们证明IMRMR1在一般的预测中可以强有力地展示一个不连续的模型。