A common assumption in causal inference from observational data is that there is no hidden confounding. Yet it is, in general, impossible to verify the presence of hidden confounding factors from a single dataset. Under the assumption of independent causal mechanisms underlying the data generating process, we demonstrate a way to detect unobserved confounders when having multiple observational datasets coming from different environments. We present a theory for testable conditional independencies that are only absent during hidden confounding and examine cases where we violate its assumptions: degenerate & dependent mechanisms, and faithfulness violations. Additionally, we propose a procedure to test these independencies and study its empirical finite-sample behavior using simulation studies and semi-synthetic data based on a real-world dataset. In most cases, our theory correctly predicts the presence of hidden confounding, particularly when the confounding bias is~large.
翻译:从观测数据得出的因果关系推论中,一个常见的假设是,不存在隐藏的混淆。然而,一般来说,无法核实单个数据集中隐藏的混淆因素的存在。根据数据生成过程所依据的独立因果机制的假设,我们展示了一种在从不同环境中获得多个观测数据集时探测未观察到的混淆因素的方法。我们提出了一个可测试的有条件的不依赖性理论,这种理论只有在隐藏的混淆过程中才不存在,并审查我们违反其假设的情况:退化和依赖机制,以及违反忠诚。此外,我们提出一个程序,利用模拟研究和基于现实世界数据集的半合成数据测试这些不依赖性并研究其实证的有限抽样行为。在多数情况下,我们的理论正确地预测了隐藏的混淆的存在,特别是在混淆的偏差很大的情况下。