Learning causal structure from observational data is a fundamental challenge in machine learning. The majority of commonly used differentiable causal discovery methods are non-identifiable, turning this problem into a continuous optimization task prone to data biases. In many real-life situations, data is collected from different environments, in which the functional relations remain consistent across environments, while the distribution of additive noises may vary. This paper proposes Differentiable Invariant Causal Discovery (DICD), utilizing the multi-environment information based on a differentiable framework to avoid learning spurious edges and wrong causal directions. Specifically, DICD aims to discover the environment-invariant causation while removing the environment-dependent correlation. We further formulate the constraint that enforces the target structure equation model to maintain optimal across the environments. Theoretical guarantees for the identifiability of proposed DICD are provided under mild conditions with enough environments. Extensive experiments on synthetic and real-world datasets verify that DICD outperforms state-of-the-art causal discovery methods up to 36% in SHD. Our code will be open-sourced upon acceptance.
翻译:观察数据的学习因果结构是机器学习中的一项根本挑战。 多数常用的不同因果发现方法无法识别, 将这一问题转化为一个持续优化的任务, 容易出现数据偏差。 在许多现实环境中, 数据是从不同环境中收集的, 功能关系在各种环境中保持一致, 而添加性噪音的分布可能各有不同 。 本文提议利用基于不同框架的多环境信息, 以避免学习虚假的边缘和错误的因果方向。 具体地说, DICD 的目的是在消除环境依赖性相关关系的同时, 发现环境变化性因果关系。 我们进一步制定强制实施目标结构方程式模型的制约因素, 以保持环境之间的最佳状态 。 对拟议的DICD 的可识别性, 在环境允许的温和条件下提供理论保障 。 合成和真实世界数据集的广泛实验证实, DICD 将状态的因果发现方法比SHD 的36% 。 我们的代码在被接受后将公开来源。