Mining genuine mechanisms underlying the complex data generation process in real-world systems is a fundamental step in promoting interpretability of, and thus trust in, data-driven models. Therefore, we propose a variation-based cause effect identification (VCEI) framework for causal discovery in bivariate systems from a single observational setting. Our framework relies on the principle of independence of cause and mechanism (ICM) under the assumption of an existing acyclic causal link, and offers a practical realization of this principle. Principally, we artificially construct two settings in which the marginal distributions of one covariate, claimed to be the cause, are guaranteed to have non-negligible variations. This is achieved by re-weighting samples of the marginal so that the resultant distribution is notably distinct from this marginal according to some discrepancy measure. In the causal direction, such variations are expected to have no impact on the effect generation mechanism. Therefore, quantifying the impact of these variations on the conditionals reveals the genuine causal direction. Moreover, we formulate our approach in the kernel-based maximum mean discrepancy, lifting all constraints on the data types of cause-and-effect covariates, and rendering such artificial interventions a convex optimization problem. We provide a series of experiments on real and synthetic data showing that VCEI is, in principle, competitive to other cause effect identification frameworks.
翻译:在现实世界系统中,复杂的数据生成过程背后的采矿真正机制是促进解释数据驱动模型并因而信任数据驱动模型的基本步骤。因此,我们提议一个基于差异的因果识别框架(VCEI),用于从单一的观察环境中在双轨制系统中发现因果发现;我们的框架依赖根据现有循环因果关系假设的事业和机制独立性原则(ICM),并提供了这一原则的实际实现。主要,我们人为地构建了两种环境,其中一种共变的边际分布(据称是其原因)有不可忽略的变异保证。这是通过对边缘抽样进行重新加权实现的,因此,根据某种差异计量,结果分布明显有别于这一边际。在因果关系方面,这种变异预计不会影响产生效果的机制。因此,量化这些变异对条件的影响显示了真正的因果方向。此外,我们在基于内核的最小值差异中设计了我们的方法,对数据因果差异的所有限制都是不可忽略的。通过边际的样本实现的,因此,结果的分布与这一边际分配明显不同。在某种差异的尺度上,这种变异性变化不会影响对结果产生任何影响。