Causality analysis is an important problem lying at the heart of science, and is of particular importance in data science and machine learning. An endeavor during the past 16 years viewing causality as real physical notion so as to formulate it from first principles, however, seems to go unnoticed. This study introduces to the community this line of work, with a long-due generalization of the information flow-based bivariate time series causal inference to multivariate series, based on the recent advance in theoretical development. The resulting formula is transparent, and can be implemented as a computationally very efficient algorithm for application. It can be normalized, and tested for statistical significance. Different from the previous work along this line where only information flows are estimated, here an algorithm is also implemented to quantify the influence of a unit to itself. While this forms a challenge in some causal inferences, here it comes naturally, and hence the identification of self-loops in a causal graph is fulfilled automatically as the causalities along edges are inferred. To demonstrate the power of the approach, presented here are two applications in extreme situations. The first is a network of multivariate processes buried in heavy noises (with the noise-to-signal ratio exceeding 100), and the second a network with nearly synchronized chaotic oscillators. In both graphs, confounding processes exist. While it seems to be a huge challenge to reconstruct from given series these causal graphs, an easy application of the algorithm immediately reveals the desideratum. Particularly, the confounding processes have been accurately differentiated. Considering the surge of interest in the community, this study is very timely.
翻译:因果关系分析是科学核心的一个重要问题,在数据科学和机器学习中具有特别重要的意义。 在过去的16年中,将因果关系视为真实的物理概念,从最初的原则中形成,但似乎无人注意。本研究向社区介绍了这种工作方针,根据最近理论发展的进展,对基于信息流动的双变时间序列的多重变数序列进行了长期的概括性推导,由此得出的公式是透明的,可以作为一种非常有效的应用计算算法加以实施。它可以正常化,并测试其统计意义。它不同于先前在仅估算信息流动的线上开展的工作,这里还采用了一种算法,以量化一个单位本身的影响。虽然这在某种因果推论中构成了一种挑战,但这里自然而来,因此,在因果图中自我渗漏的分化是自动完成的,因为从边缘的因果关系推理推理,这里展示的方法的强度是两种极端情况下的。首先是一个易变数过程的网络, 其尾端是接近于深度的螺旋的网络, 其尾部的螺旋性, 其尾部和尾部之间似乎都是一个非常激烈的螺旋的螺旋。