It has become increasingly common nowadays to collect observations of feature and response pairs from different environments. As a consequence, one has to apply learned predictors to data with a different distribution due to distribution shifts. One principled approach is to adopt the structural causal models to describe training and test models, following the invariance principle which says that the conditional distribution of the response given its predictors remains the same across environments. However, this principle might be violated in practical settings when the response is intervened. A natural question is whether it is still possible to identify other forms of invariance to facilitate prediction in unseen environments. To shed light on this challenging scenario, we introduce invariant matching property (IMP) which is an explicit relation to capture interventions through an additional feature. This leads to an alternative form of invariance that enables a unified treatment of general interventions on the response. We analyze the asymptotic generalization errors of our method under both the discrete and continuous environment settings, where the continuous case is handled by relating it to the semiparametric varying coefficient models. We present algorithms that show competitive performance compared to existing methods over various experimental settings including a COVID dataset.
翻译:现在越来越普遍的做法是收集不同环境中的特征和对应对的观测结果。因此,人们不得不对分布因分布变化而不同的数据应用学得的预测数据。一个原则性做法是采用结构性因果模型来描述培训和测试模型,遵循“变化”原则,该原则规定,根据预测结果有条件地分配答复在各种环境中都是一样的。然而,在实际环境中,当作出反应时,这一原则可能会受到侵犯。一个自然的问题是,是否仍然有可能查明其他形式的不易情况,以便利在无形环境中进行预测。为了说明这一具有挑战性的设想,我们引入了变量匹配属性(IMP),这是与通过额外特征捕捉干预措施的明确关系。这导致一种替代的因果模型,使得能够统一处理对反应的一般干预措施。我们分析了在离散和连续的环境环境中,我们方法的无谓的概括性错误,通过将它与半参数差异的系数模型联系起来处理。我们提出了算法,显示与包括COVID数据集在内的各种实验环境的现有方法相比具有竞争性的表现。