Knowing the causal structure of a system is of fundamental interest in many areas of science and can aid the design of prediction algorithms that work well under manipulations to the system. The causal structure becomes identifiable from the observational distribution under certain restrictions. To learn the structure from data, score-based methods evaluate different graphs according to the quality of their fits. However, for large nonlinear models, these rely on heuristic optimization approaches with no general guarantees of recovering the true causal structure. In this paper, we consider structure learning of directed trees. We propose a fast and scalable method based on Chu-Liu-Edmonds' algorithm we call causal additive trees (CAT). For the case of Gaussian errors, we prove consistency in an asymptotic regime with a vanishing identifiability gap. We also introduce a method for testing substructure hypotheses with asymptotic family-wise error rate control that is valid post-selection and in unidentified settings. Furthermore, we study the identifiability gap, which quantifies how much better the true causal model fits the observational distribution, and prove that it is lower bounded by local properties of the causal model. Simulation studies demonstrate the favorable performance of CAT compared to competing structure learning methods.
翻译:了解一个系统的因果结构对科学的许多领域具有根本意义,有助于设计在系统操纵下运作良好的预测算法。因果结构从某些限制下的观测分布中可以识别。从数据中学习结构,基于分数的方法根据不同图表的相容质量对不同的图表进行评估。但是,对于大型非线性模型,这些模型依赖超光速优化方法,而没有普遍保证恢复真实因果结构。在本文中,我们考虑了定向树木的结构学习。我们基于Chu-Liu-Edmonds的算法提出了一种快速和可缩放的方法,我们称之为因果添加树(CAT)。对于高斯错误的情况,我们证明在一个无源的系统上的一致性,存在消失的可辨识性差距。我们还采用了一种测试亚结构假设的方法,使用无源家庭错率的错误率控制方法,这种方法在选后和在不明环境中是有效的。我们研究了可辨识性差异性差,这种方法可以辨别出真正的因果模型如何更好地适应观察分布,我们称之为因果添加性添加性树(CAT)。我们用当地学习方法来证明它是否具有较低的因果关系。