In recent years a lot of research has been conducted within the area of causal inference and causal learning. Many methods have been developed to identify the cause-effect pairs in models and have been successfully applied to observational real-world data in order to determine the direction of causal relationships. Many of these methods require simplifying assumptions, such as absence of confounding, cycles, and selection bias. Yet in bivariate situations causal discovery problems remain challenging. One class of such methods, that also allows tackling the bivariate case, is based on Additive Noise Models (ANMs). Unfortunately, one aspect of these methods has not received much attention until now: what is the impact of different noise levels on the ability of these methods to identify the direction of the causal relationship. This work aims to bridge this gap with the help of an empirical study. For this work, we considered bivariate cases, which is the most elementary form of a causal discovery problem where one needs to decide whether X causes Y or Y causes X, given joint distributions of two variables X, Y. Furthermore, two specific methods have been selected, \textit{Regression with Subsequent Independence Test} and \textit{Identification using Conditional Variances}, which have been tested with an exhaustive range of ANMs where the additive noises' levels gradually change from 1% to 10000% of the causes' noise level (the latter remains fixed). Additionally, the experiments in this work consider several different types of distributions as well as linear and non-linear ANMs. The results of the experiments show that these methods can fail to capture the true causal direction for some levels of noise.
翻译:近年来,在因果推断和因果学习领域开展了大量研究,在因果推断和因果学习领域开展了许多研究,许多方法已经开发出来,以确定模型中的因果配对,并成功地应用于观测真实世界数据,以确定因果关系的方向。许多这些方法都需要简化假设,例如没有混淆、循环和选择偏差。但在两极情况中,因果发现问题仍然具有挑战性。其中一类方法,也能够解决双差体噪音模型(ANMs),这种方法的一个类别,也能够解决双差体噪音案例。不幸的是,这些方法的一个方面直到现在还没有得到多少注意:不同噪音水平对这些方法确定因果关系方向的能力的影响是什么? 这项工作的目的是用经验研究来弥补这一差距。 对于这项工作,我们考虑了两极情况,这是最基本的一种因果发现问题,其中需要决定X原因Y或Y原因X,因为两个变量的联合分布。此外,两种具体方法已经被选定, 线性分布 {(retait) 和累进性实验结果 与后一独立度A 测试 和正变范围之间,这些结果的正变范围。