Learning causal structure from observational data is a fundamental challenge in machine learning. However, the majority of commonly used differentiable causal discovery methods are non-identifiable, turning this problem into a continuous optimization task prone to data biases. In many real-life situations, data is collected from different environments, in which the functional relations remain consistent across environments, while the distribution of additive noises may vary. This paper proposes Differentiable Invariant Causal Discovery (DICD), utilizing the multi-environment information based on a differentiable framework to avoid learning spurious edges and wrong causal directions. Specifically, DICD aims to discover the environment-invariant causation while removing the environment-dependent correlation. We further formulate the constraint that enforces the target structure equation model to maintain optimal across the environments. Theoretical guarantees for the identifiability of proposed DICD are provided under mild conditions with enough environments. Extensive experiments on synthetic and real-world datasets verify that DICD outperforms state-of-the-art causal discovery methods up to 36% in SHD. Our code will be open-sourced.
翻译:观察数据的学习因果结构是机器学习中的一项根本挑战。然而,大多数常用的不同因果发现方法都无法辨别,将这一问题转化为一种持续优化的任务,容易造成数据偏差。在许多现实环境中,数据是从不同环境中收集的,在这种环境中,功能关系在各种环境中保持一致,而添加噪音的分布可能各有不同。本文提议利用基于不同框架的多环境信息,避免学习虚假的边缘和错误的因果方向。具体地说,DICD旨在发现环境-因果性因果关系,同时消除环境依赖性相关关系。我们进一步制定强制实施目标结构方程式模型的制约因素,以保持整个环境的最佳性。提议的DICD的可识别性在环境比较温和的条件下得到理论上的保证。关于合成和真实世界数据集的广泛实验证实,DICD在SHD中超越了36%的状态-因果性发现方法。我们的代码将是开源的。