We are interested in the problem of learning the directed acyclic graph (DAG) when data are generated from a linear structural equation model (SEM) and the causal structure can be characterized by a polytree. Under the Gaussian polytree models, we study sufficient conditions on the sample sizes for the well-known Chow-Liu algorithm to exactly recover both the skeleton and the equivalence class of the polytree, which is uniquely represented by a CPDAG. On the other hand, necessary conditions on the required sample sizes for both skeleton and CPDAG recovery are also derived in terms of information-theoretic lower bounds, which match the respective sufficient conditions and thereby give a sharp characterization of the difficulty of these tasks. We also consider extensions to the sub-Gaussian case, and then study the estimation of the inverse correlation matrix under such models. Our theoretical findings are illustrated by comprehensive numerical simulations, and experiments on benchmark data also demonstrate the robustness of polytree learning when the true graphical structures can only be approximated by polytrees.
翻译:我们感兴趣的是,当数据来自线性结构方程模型(SEM)和因果结构可以以多树为特征时,学习定向圆形图(DAG)的问题。在高山多树模型中,我们研究了众所周知的周立算法样本大小的充分条件,以完全恢复多树的骨架和等值类别,而多树的骨架和等值类别以CPDAA为独特的代表。另一方面,骨架和CPDAG恢复所需的样本大小的必要条件也从信息理论下限中得出,这符合相应的充分条件,从而对这些任务的困难作了清晰的描述。我们还考虑扩展亚高加索地区案例,然后研究此类模型下的反相关矩阵估计。我们理论结论通过综合数字模拟得到说明,关于基准数据的实验还表明,当真正的图形结构只能由多树类所近似时,多树群学习的强度也非常强。