Bayesian networks are probabilistic graphical models widely employed to understand dependencies in high dimensional data, and even to facilitate causal discovery. Learning the underlying network structure, which is encoded as a directed acyclic graph (DAG) is highly challenging mainly due to the vast number of possible networks in combination with the acyclicity constraint. Efforts have focussed on two fronts: constraint-based methods that perform conditional independence tests to exclude edges and score and search approaches which explore the DAG space with greedy or MCMC schemes. Here we synthesise these two fields in a novel hybrid method which reduces the complexity of MCMC approaches to that of a constraint-based method. Individual steps in the MCMC scheme only require simple table lookups so that very long chains can be efficiently obtained. Furthermore, the scheme includes an iterative procedure to correct for errors from the conditional independence tests. The algorithm offers markedly superior performance to alternatives, particularly because DAGs can also be sampled from the posterior distribution, enabling full Bayesian model averaging for much larger Bayesian networks.
翻译:Bayesian 网络是广泛用来理解高维数据依赖性的概率图形模型,甚至便于因果发现。学习深层网络结构(以定向环绕图(DAG)编码)非常困难,这主要是因为与环绕限制相结合的网络数量庞大。努力集中于两个方面:以限制为基础的方法进行有条件的独立测试,排除边缘和得分,并探索带有贪婪或MCMC计划的DA空间。我们在这里将这两个领域合成一种新型混合方法,降低MCMC方法对基于约束方法的复杂程度。MMC办法的个别步骤只需要简单的表格检查,才能有效地获得非常长的链条。此外,这个办法还包括一种迭接式程序,以纠正有条件独立测试的错误。算法为替代方法提供了明显优异的性能,特别是因为DAG也可以从后方分布中取样,使全Bayesian 模型能够平均用于大得多的Bayesian 网络。