Invariant Causal Prediction (Peters et al., 2016) is a technique for out-of-distribution generalization which assumes that some aspects of the data distribution vary across the training set but that the underlying causal mechanisms remain constant. Recently, Arjovsky et al. (2019) proposed Invariant Risk Minimization (IRM), an objective based on this idea for learning deep, invariant features of data which are a complex function of latent variables; many alternatives have subsequently been suggested. However, formal guarantees for all of these works are severely lacking. In this paper, we present the first analysis of classification under the IRM objective$-$as well as these recently proposed alternatives$-$under a fairly natural and general model. In the linear case, we show simple conditions under which the optimal solution succeeds or, more often, fails to recover the optimal invariant predictor. We furthermore present the very first results in the non-linear regime: we demonstrate that IRM can fail catastrophically unless the test data are sufficiently similar to the training distribution$-$this is precisely the issue that it was intended to solve. Thus, in this setting we find that IRM and its alternatives fundamentally do not improve over standard Empirical Risk Minimization.
翻译:挥发性剖面预测(Peters等人,2016年)是一种传播性概括技术,它假定数据分布的某些方面在培训组合中各不相同,但基本因果关系机制保持不变。最近,Arjovsky等人(2019年)提议了《挥发性风险最小化》(IRM),这是一个基于这一认识的目标,即了解数据深度和不变化性特征,这些数据是潜伏变量的一个复杂功能;后来又提出了许多替代方法。然而,所有这些工程的正式保障都严重缺乏。在本文件中,我们首次分析了IRM目标-$的分类,以及最近提出的在相当自然和一般模式下以美元为单位的替代方案。在线性案例中,我们展示了最佳解决方案成功或更经常无法恢复最佳挥发性预测器的简单条件。我们还介绍了非线性制度的第一个结果:我们证明,除非测试数据与培训分配-美元足够相似,否则,IMM可以灾难性地失败。这恰恰是它本意要解决的问题。因此,在这种背景下,我们发现IMRM及其替代方法从根本上改进了风险标准。