It is a long-standing question to discover causal relations among a set of variables in many empirical sciences. Recently, Reinforcement Learning (RL) has achieved promising results in causal discovery from observational data. However, searching the space of directed graphs and enforcing acyclicity by implicit penalties tend to be inefficient and restrict the existing RL-based method to small scale problems. In this work, we propose a novel RL-based approach for causal discovery, by incorporating RL into the ordering-based paradigm. Specifically, we formulate the ordering search problem as a multi-step Markov decision process, implement the ordering generating process with an encoder-decoder architecture, and finally use RL to optimize the proposed model based on the reward mechanisms designed for~each ordering. A generated ordering would then be processed using variable selection to obtain the final causal graph. We analyze the consistency and computational complexity of the proposed method, and empirically show that a pretrained model can be exploited to accelerate training. Experimental results on both synthetic and real data sets shows that the proposed method achieves a much improved performance over existing RL-based method.
翻译:在许多实证科学中,发现一系列变数之间的因果关系是一个长期存在的问题。最近,加强学习(RL)在观测数据的因果发现方面取得了令人乐观的结果。然而,通过隐性惩罚搜索定向图表的空间和强制实施循环方法往往效率低下,并将现有的基于RL的方法局限于小规模问题。在这项工作中,我们提出一个新的基于RL的因果发现方法,将RL纳入基于命令的模式。具体地说,我们把定购搜索问题作为一个多步骤的Markov决策程序,用编码脱coder结构实施定购生成程序,最后利用RL优化以设计给~each订购的奖励机制为基础的拟议模式。随后,将利用变量选择处理生成的定购,以获得最终因果图表。我们分析了拟议方法的一致性和计算复杂性,从经验上表明,可以利用预先训练的模型来加速培训。合成和真实数据集的实验结果显示,拟议的方法比现有的基于RL的方法取得了大大改进的业绩。