We analyze the complexity of learning directed acyclic graphical models from observational data in general settings without specific distributional assumptions. Our approach is information-theoretic and uses a local Markov boundary search procedure in order to recursively construct ancestral sets in the underlying graphical model. Perhaps surprisingly, we show that for certain graph ensembles, a simple forward greedy search algorithm (i.e. without a backward pruning phase) suffices to learn the Markov boundary of each node. This substantially improves the sample complexity, which we show is at most polynomial in the number of nodes. This is then applied to learn the entire graph under a novel identifiability condition that generalizes existing conditions from the literature. As a matter of independent interest, we establish finite-sample guarantees for the problem of recovering Markov boundaries from data. Moreover, we apply our results to the special case of polytrees, for which the assumptions simplify, and provide explicit conditions under which polytrees are identifiable and learnable in polynomial time. We further illustrate the performance of the algorithm, which is easy to implement, in a simulation study. Our approach is general, works for discrete or continuous distributions without distributional assumptions, and as such sheds light on the minimal assumptions required to efficiently learn the structure of directed graphical models from data.
翻译:我们从一般环境的观测数据中,在没有具体的分布假设的情况下,从一般环境的观测数据中,分析以环环形图形模型的复杂程度。我们的方法是信息理论,并使用本地的Markov边界搜索程序,以便在基本图形模型中反复构造祖传的数据集。也许令人惊讶的是,我们对某些图形组合,一个简单的远方贪婪搜索算法(即没有后向修剪阶段)足以了解每个节点的Markov边界。这大大改进了抽样复杂性,我们在节点数中显示的抽样复杂性最多是多元的。然后,我们用这个方法在一个新颖的可识别性条件下学习整个图表,从文献中概括现有条件。作为一个独立的兴趣事项,我们为从数据中恢复Markov边界的问题建立了有限的抽样保障。此外,我们把我们的结果应用到多元树的特例,在这个特例中,假设是简化的,并提供了在多端点时间里可以识别和学习的明晰条件。我们用这个方法进一步说明算法的性,它很容易执行,在模拟研究中比较容易执行,在模拟中比较的模型中可以进行。我们的方法是连续地、不连续地分配。我们的方法是用于不断的图形模型的分类。