This paper considers an extension of the linear non-Gaussian acyclic model (LiNGAM) that determines the causal order among variables from a dataset when the variables are expressed by a set of linear equations, including noise. In particular, we assume that the variables are binary. The existing LiNGAM assumes that no confounding is present, which is restrictive in practice. Based on the concept of independent component analysis (ICA), this paper proposes an extended framework in which the mutual information among the noises is minimized. Another significant contribution is to reduce the realization of the shortest path problem. The distance between each pair of nodes expresses an associated mutual information value, and the path with the minimum sum (KL divergence) is sought. Although $p!$ mutual information values should be compared, this paper dramatically reduces the computation when no confounding is present. The proposed algorithm finds the globally optimal solution, while the existing locally greedily seek the order based on hypothesis testing. We use the best estimator in the sense of Bayes/MDL that correctly detects independence for mutual information estimation. Experiments using artificial and actual data show that the proposed version of LiNGAM achieves significantly better performance, particularly when confounding is present.
翻译:本文考虑的是线性非Gausian环绕模型(LiNGAM)的延伸,该模型在用一组线性方程式表达变量时,确定数据集变量之间的因果顺序,这些变量以一组线性方程式表示,包括噪音。特别是,我们假定变量是二元的。现有的LINGAM假设不存在混乱,实际上这是限制性的。根据独立组成部分分析的概念(ICA),本文件建议了一个扩大的框架,最大限度地减少噪音之间的相互信息。另一个重要贡献是减少实现最短路径问题。每个节点的对每对对对点的距离表示一个相关的相互信息价值,并寻求最小和最小总和(KL差)路径的距离。虽然应该比较$p.$共同信息值,但该文件大大降低了计算结果,因为没有混淆。拟议的算法找到全球最佳解决办法,而现有的当地贪婪则根据假设测试寻求秩序。我们使用Bayes/MDL最准确的估算者,该估计了相互信息独立。在使用人工和实际数据进行实验时,特别是模拟时发现比较结果更加出色。