Causal discovery from observational data is an important tool in many branches of science. Under certain assumptions it allows scientists to explain phenomena, predict, and make decisions. In the large sample limit, sound and complete causal discovery algorithms have been previously introduced, where a directed acyclic graph (DAG), or its equivalence class, representing causal relations is searched. However, in real-world cases, only finite training data is available, which limits the power of statistical tests used by these algorithms, leading to errors in the inferred causal model. This is commonly addressed by devising a strategy for using as few as possible statistical tests. In this paper, we introduce such a strategy in the form of a recursive wrapper for existing constraint-based causal discovery algorithms, which preserves soundness and completeness. It recursively clusters the observed variables using the normalized min-cut criterion from the outset, and uses a baseline causal discovery algorithm during backtracking for learning local sub-graphs. It then combines them and ensures completeness. By an ablation study, using synthetic data, and by common real-world benchmarks, we demonstrate that our approach requires significantly fewer statistical tests, learns more accurate graphs, and requires shorter run-times than the baseline algorithm.
翻译:从观测数据中得出的因果发现是许多科学分支的一个重要工具。 在某些假设下, 科学家可以解释现象、 预测和作出决定。 在大量抽样限值中, 先前引入了健全和完整的因果发现算法, 即直接的环绕图(DAG)或其等同等级, 代表因果关系。 然而, 在现实世界中, 仅有有限的培训数据, 限制了这些算法使用的统计测试能力, 导致推断因果模型中的错误。 通常通过设计尽可能少使用统计测试的战略来解决这个问题。 在本文中, 我们引入了这样一种战略, 以循环式包装方式处理现有的基于限制的因果发现算法, 以保持稳健性和完整性。 它从一开始就将观察到的变量归为组合, 使用常规的微切标准进行回溯跟踪学习本地子图时使用基线性因果发现算法。 然后将这些数据合并起来, 确保完整性。 通过模拟研究, 使用合成数据, 和共同的现实世界基准, 我们证明我们的方法需要比基准更短得多的统计测试, 学习更精确的图表。