Causal Discovery (CD) is the process of identifying the cause-effect relationships among the variables of a system from data. Over the years, several methods have been developed primarily based on the statistical properties of data to uncover the underlying causal mechanism. In this study, we present an extensive discussion on the methods designed to perform causal discovery from both independent and identically distributed (i.i.d.) data and time series data. For this purpose, we first introduce the common terminologies in causal discovery, and then provide a comprehensive discussion of the algorithms designed to identify the causal edges in different settings. We further discuss some of the benchmark datasets available for evaluating the performance of the causal discovery methods, available tools or software packages to perform causal discovery readily, and the common metrics used to evaluate these methods. We also test some common causal discovery algorithms on different benchmark datasets, and compare their performances. Finally, we conclude by presenting the common challenges involved in causal discovery, and also, discuss the applications of causal discovery in multiple areas of interest.
翻译:因果发现(CD)是指从数据中识别系统变量之间因果关系的过程。多年来,基于数据的统计属性已经发展出了许多方法,以揭示基础的因果机制。 本研究深入讨论了设计用于从独立同分布(i.i.d.)数据和时间序列数据中执行因果发现的方法。为此,我们首先介绍了因果发现的常见术语,然后对旨在在不同环境中识别因果边缘的算法进行了全面的讨论。我们进一步讨论了用于评估因果发现方法性能的一些基准数据集,以及执行因果发现的可用工具或软件包和常用指标。我们还测试了不同基准数据集上的一些常见因果发现算法,并比较了它们的性能。最后,我们通过讨论因果发现所涉及的常见挑战,以及讨论因果发现在多个感兴趣的领域中的应用来总结。