Causal Bayesian networks are widely used tools for summarising the dependencies between variables and elucidating their putative causal relationships. By restricting the search to trees, for example, learning the optimum from data is polynomial, but this does not guarantee finding the optimal network overall. Without similar restrictions, exact discovery of the optimum is computationally hard in general and no polynomial results are known. The current state-of-the-art approaches are integer linear programming over the underlying space of directed acyclic graphs, dynamic programming and shortest-path searches over the space of topological orders, and constraint programming combining both. For dynamic programming over orders, the computational complexity is known to be exponential base 2 in the number of variables in the network. We demonstrate how to use properties of Bayesian networks to prune the search space and lower the computational cost, while still guaranteeing exact discovery of the provably optimal network. We also include new path-search and divide-and-conquer criteria. Without a priori constraining the search to certain types of networks, the algorithm completes in quadratic time when the optimum is a matching, and in polynomial time when the optimum belongs to any network class with logarithmically-bound largest connected components. In simulation studies we observe the polynomial dependence for sparse networks and that, beyond some critical value, the logarithm of the base grows with the network density. Our approach then out-competes the state-of-the-art at lower densities. These results therefore pave the way for faster exact causal discovery in larger and sparser networks.
翻译:暂无翻译