Discovering causal structure among a set of variables is a fundamental problem in many empirical sciences. Traditional score-based casual discovery methods rely on various local heuristics to search for a Directed Acyclic Graph (DAG) according to a predefined score function. While these methods, e.g., greedy equivalence search, may have attractive results with infinite samples and certain model assumptions, they are usually less satisfactory in practice due to finite data and possible violation of assumptions. Motivated by recent advances in neural combinatorial optimization, we propose to use Reinforcement Learning (RL) to search for the DAG with the best scoring. Our encoder-decoder model takes observable data as input and generates graph adjacency matrices that are used to compute rewards. The reward incorporates both the predefined score function and two penalty terms for enforcing acyclicity. In contrast with typical RL applications where the goal is to learn a policy, we use RL as a search strategy and our final output would be the graph, among all graphs generated during training, that achieves the best reward. We conduct experiments on both synthetic and real datasets, and show that the proposed approach not only has an improved search ability but also allows a flexible score function under the acyclicity constraint.
翻译:在一系列变量中发现因果结构是许多实验科学中的一个根本问题。传统的基于分数的偶然发现方法依靠各种本地的循环图(DAG)来根据预定的评分函数寻找定向循环图(DAG ) 。这些方法,例如贪婪的等值搜索,可能具有有无限样本和某些模型假设的吸引力结果,但由于数据有限和可能违反假设,这些方法在实践中通常不那么令人满意。在神经组合优化的最新进展的推动下,我们提议使用强化学习(RL)来寻找最佳评分的DAG。我们的编码解码模型将可观测的数据作为输入,并生成用于计算奖赏的图形相邻矩阵。奖励既包括预先界定的得分函数,又包括执行周期性的两个惩罚条件。与典型的RL应用相比,我们使用RL作为搜索策略,我们的最后产出将是图表,在培训期间生成的所有图表中,获得最佳奖分。我们在合成和真实数据选项上进行实验,但是在搜索能力上也允许采用改进的排序功能。