Robustness to certain distribution shifts is a key requirement in many ML applications. Often, relevant distribution shifts can be formulated in terms of interventions on the process that generates the input data. Here, we consider the problem of learning a predictor whose risk across such shifts is invariant. A key challenge to learning such risk-invariant predictors is shortcut learning, or the tendency for models to rely on spurious correlations in practice, even when a predictor based on shift-invariant features could achieve optimal i.i.d generalization in principle. We propose a flexible, causally-motivated approach to address this challenge. Specifically, we propose a regularization scheme that makes use of auxiliary labels for potential shortcut features, which are often available at training time. Drawing on the causal structure of the problem, we enforce a conditional independence between the representation used to predict the main label and the auxiliary labels. We show both theoretically and empirically that this causally-motivated regularization scheme yields robust predictors that generalize well both in-distribution and under distribution shifts, and does so with better sample efficiency than standard regularization or weighting approaches.
翻译:在某些 ML 应用中, 某些分销转换的强力性是许多 ML 应用中的关键要求。 通常, 相关的分销转换可以在生成输入数据的过程的干预方面形成。 在这里, 我们考虑的是学习一个预测器的问题, 这种预测器在这种转变中的风险是变化无常的。 学习这种风险- 变化预测器的一个关键挑战在于: 快速学习, 或者是模型在实际中依赖虚假的关联的倾向, 即便基于转换- 变化特性的预测器原则上可以达到最佳i. i. 普遍化。 我们提议一种灵活、 因果关系驱动的方法来应对这一挑战。 具体地说, 我们提议一种正规化办法, 利用辅助标签来设置潜在的快捷特征, 而在培训时往往可以找到这些特征。 根据问题的因果结构, 我们强制实行一种有条件的独立性, 用于预测主要标签和辅助标签的代言。 我们从理论上和从经验上都表明, 这种因变化驱动的正规化计划会产生稳健的预测器, 能够将分布和分配之下的转移普遍化,, 并且比标准正规化或加权化方法更具有样的效率 。