We propose to learn an invariant causal predictor that is robust to distributional shifts, in the supervised regression scenario. Based on a disentangled causal factorization that describes the underlying data generating process, we attribute the distributional shifts to mutation of generating factors, which covers a wide range of cases of distributional shifts as we do not make prior specifications on the causal structure or the source of mutation. Under this causal framework, we identify a set of invariant predictors based on the do-operator. We provide a sufficient and necessary condition for a predictor to be min-max optimal, i.e., minimizes the worst-case quadratic loss among all domains. This condition is justifiable under the Markovian and faithfulness assumptions, thus inspiring a practical algorithm to identify the optimal predictor. For empirical estimation, we propose a permutation-regeneration scheme guided by a local causal discovery procedure. The utility and effectiveness of our method are demonstrated in simulation data and two real-world applications: Alzheimer's disease diagnosis and gene function prediction.
翻译:我们提议在受监督的回归假设中,学习一种对分布性转变具有活力的因果变异预测。根据描述基本数据生成过程的分解因果因子因子因子因子因子因子因子因子变异,我们将分布性变异归为产生因素的变异,这涵盖分布性变异的广泛情况,因为我们没有事先对因果关系结构或变异源作出说明。在这个因果框架下,我们根据操作器,确定一套不变性预测器。我们提供了一个充足和必要的条件,使预测器达到最小最大最佳状态,即最大限度地减少所有域中最坏的因子损失。根据Markovian和忠诚的假设,这一条件是合理的,从而激励一种实用的算法,以确定最佳预测器。关于经验估计,我们提议了一个由当地因果发现程序指导的变异性再生计划。我们的方法的效用和有效性在模拟数据中得到了证明,两个真实世界应用:阿尔茨海默氏病诊断和基因功能预测。