We propose a method for learning linear models whose predictive performance is robust to causal interventions on unobserved variables, when noisy proxies of those variables are available. Our approach takes the form of a regularization term that trades off between in-distribution performance and robustness to interventions. Under the assumption of a linear structural causal model, we show that a single proxy can be used to create estimators that are prediction optimal under interventions of bounded strength. This strength depends on the magnitude of the measurement noise in the proxy, which is, in general, not identifiable. In the case of two proxy variables, we propose a modified estimator that is prediction optimal under interventions up to a known strength. We further show how to extend these estimators to scenarios where additional information about the "test time" intervention is available during training. We evaluate our theoretical findings in synthetic experiments and using real data of hourly pollution levels across several cities in China.
翻译:我们提出一种方法来学习线性模型,这些模型的预测性性能强得足以对未观察到的变量进行因果干预,当这些变量的杂乱替代物出现时,我们的方法是采用正规化的术语,在分配性绩效和稳健性与干预措施之间进行交换。根据线性结构性因果模型的假设,我们表明,可以使用单一的代用方法来创建根据约束性强度的干预措施预测最佳的估算器。这种强度取决于代用物中的测量噪音的大小,通常无法辨别。在两个代用变量中,我们提议一个经修改的估算器,在干预下预测最佳,达到已知的强度。我们进一步展示如何将这些估算器扩大到在培训期间可获得关于“试验时间”干预的额外信息的情景。我们评估了我们在合成实验中的理论结果,并使用中国几个城市的小时污染水平的实际数据。