Distinguishing cause from effect using observations of a pair of random variables is a core problem in causal discovery. Most approaches proposed for this task, namely additive noise models (ANM), are only adequate for quantitative data. We propose a criterion to address the cause-effect problem with categorical variables (living in sets with no meaningful order), inspired by seeing a conditional probability mass function (pmf) as a discrete memoryless channel. We select as the most likely causal direction the one in which the conditional pmf is closer to a uniform channel (UC). The rationale is that, in a UC, as in an ANM, the conditional entropy (of the effect given the cause) is independent of the cause distribution, in agreement with the principle of independence of cause and mechanism. Our approach, which we call the uniform channel model (UCM), thus extends the ANM rationale to categorical variables. To assess how close a conditional pmf (estimated from data) is to a UC, we use statistical testing, supported by a closed-form estimate of a UC channel. On the theoretical front, we prove identifiability of the UCM and show its equivalence with a structural causal model with a low-cardinality exogenous variable. Finally, the proposed method compares favorably with recent state-of-the-art alternatives in experiments on synthetic, benchmark, and real data.
翻译:使用随机变量观测结果来区分因果是因果发现的一个核心问题。为这项任务提出的大多数方法,即添加噪声模型(AMNM),都只够量化数据。我们提出了一个标准,以解决因果问题,即绝对变量(以没有实际顺序的组合生活),其依据是将一个有条件的概率质量功能(pmf)视为一个离散的记忆性信道。我们选择条件式pmf接近一个统一频道(UC)的最可能的因果方向。我们使用统计测试,并辅之以对UC频道的封闭式估计。在理论方面,我们证明UCM(因果效应)与原因分布无关,符合原因和机制的独立性原则。我们称之为统一频道模型(UCM),从而将AM的理由扩大到绝对的变量。为了评估有条件的pmf(根据数据估计)与UC频道的距离有多近,我们使用CUC频道的封闭式估计。在理论方面,我们证明UCM的有条件的(因果效应)是独立于原因分布的,与原因分布分配无关。我们称之为统一频道的特性,最后将它的结构模型模型与最近提出的结构型模型为等同性,最后与结构型模型,并附有结构型模型。</s>