Causal Bayesian Networks (CBNs) have become a powerful technology for reasoning under uncertainty, particularly in areas that require transparency and explainability, and rely on causal assumptions that enable us to simulate the effect of intervention. The graphical structure of these models can be estimated by causal knowledge, estimated from data using structure learning algorithms, or a combination of both. Various knowledge approaches have been proposed in the literature that enable us to specify prior knowledge that constrains or guides these algorithms. The objective of this paper is to investigate the impact of causal knowledge on structure learning across different settings that we might encounter in practice. We have achieved this by using a more comprehensive set of old and new knowledge approaches that enable us to obtain knowledge from heterogeneous sources, and considered a more comprehensive list of algorithms, case studies, and experimental settings. Each approach is assessed in terms of structure learning effectiveness and efficiency, including graphical accuracy, model fitting, complexity, and runtime; making this the first paper that provides a comparative evaluation of a wide range of knowledge approaches for structure learning. Because the value of knowledge depends on what data are available, we illustrate the results both with limited and big data. While the overall results show that knowledge becomes less important with big data due to higher learning accuracy rendering knowledge less important, some of the knowledge approaches are actually found to be more important with big data. Amongst the main conclusions is the observation that reduced search space obtained from knowledge does not always imply reduced computational complexity, perhaps because the relationships implied by the data and knowledge are in tension.
翻译:Causal Bayesian Network (CBNs) 已成为在不确定性下进行推理的强大技术,特别是在需要透明度和解释的领域中,并依赖能够模拟干预效果的因果假设。这些模型的图形结构可以通过因果知识来估计,通过结构学习算法或两者相结合的数据来估计。文献中提出了各种知识方法,使我们能够具体说明限制或指导这些算法的先前知识。本文件的目的是调查因果知识的复杂程度对不同环境的结构学习的影响,我们在实践中可能遇到的各种环境。我们通过使用一套更全面的新旧知识方法来实现这一目标,使我们能够模拟干预的效果。我们利用一套更全面的新旧知识方法,使我们能够从多种来源获得知识,并审议更全面的算法、案例研究和实验环境。每一种方法都是从结构学习效果和效率的角度来评估的,包括图形准确性、模型的设置、复杂性、复杂性和运行时间。本文件的目标是调查因知识的价值取决于现有数据,因此我们并非以有限和大层次的检索方法来说明结果。虽然总体数据结果显示,大额数据是较不重要的,但大层次数据则意味着数据是较不重要的。