Unobserved confounding is one of the greatest challenges for causal discovery. The case in which unobserved variables have a widespread effect on many of the observed ones is particularly difficult because most pairs of variables are conditionally dependent given any other subset, rendering the causal effect unidentifiable. In this paper we show that beyond conditional independencies, under the principle of independent mechanisms, unobserved confounding in this setting leaves a statistical footprint in the observed data distribution that allows for disentangling spurious and causal effects. Using this insight, we demonstrate that a sparse linear Gaussian directed acyclic graph among observed variables may be recovered approximately and propose an adjusted score-based causal discovery algorithm that may be implemented with general purpose solvers and scales to high-dimensional problems. We find, in addition, that despite the conditions we pose to guarantee causal recovery, performance in practice is robust to large deviations in model assumptions.
翻译:未观察到的混乱是因果发现的最大挑战之一。 未观察到的变数对许多观察到的变数产生广泛影响的情况特别困难,因为大多数变数在条件上取决于任何其他子集,因此无法辨别因果关系。在本文中,我们表明,根据独立机制的原则,除了有条件的不依赖外,在这种环境下没有观察到的混结在所观察到的数据分布中留下了统计足迹,从而可以分辨出虚假和因果影响。我们利用这一洞察力,表明在所观察到的变数中可大致地找到一个微小的线性高斯圆形图,并提议一种调整得分的因果发现算法,该算法可以与通用目的解算法和尺度一起用于解决高维度问题。此外,我们发现,尽管我们提出了保证因果恢复的条件,但实际表现强于模型假设的重大偏差。