The defining challenge for causal inference from observational data is the presence of `confounders', covariates that affect both treatment assignment and the outcome. To address this challenge, practitioners collect and adjust for the covariates, hoping that they adequately correct for confounding. However, including every observed covariate in the adjustment runs the risk of including `bad controls', variables that induce bias when they are conditioned on. The problem is that we do not always know which variables in the covariate set are safe to adjust for and which are not. To address this problem, we develop Nearly Invariant Causal Estimation (NICE). NICE uses invariant risk minimization (IRM) [Arj19] to learn a representation of the covariates that, under some assumptions, strips out bad controls but preserves sufficient information to adjust for confounding. Adjusting for the learned representation, rather than the covariates themselves, avoids the induced bias and provides valid causal inferences. We evaluate NICE on both synthetic and semi-synthetic data. When the covariates contain unknown collider variables and other bad controls, NICE performs better than adjusting for all the covariates.
翻译:从观测数据得出的因果关系推断的决定性挑战在于“封存者”的存在,这些变量既影响治疗任务,又影响结果。为了应对这一挑战,执业者收集和调整共变者,希望它们能够适当纠正混乱。然而,将每个观察到的共变者纳入调整中,有包括“错误控制”在内的风险,当它们受条件制约时引起偏差的变量。问题是,我们并不总是知道共变组合中哪些变量是安全的、可以调整的。为了解决这个问题,我们开发了“几乎不易变的Causal估计(NICE) 。NICE 使用易变风险最小化(IRM) [Arj19] 来学习共变变量的表示,根据某些假设,这些变量排除了不良控制,但保留了足够的信息,以便调整。调整所学的表达方式,而不是变量本身,避免了诱导的偏差,提供了有效的因果关系。我们评估合成和半合成合成合成的和半合成的估算(NICE)数据。当共变数含有未知的变量,而不是其他坏的控项时,我们进行更好的调整。