State-of-the-art causal discovery methods usually assume that the observational data is complete. However, the missing data problem is pervasive in many practical scenarios such as clinical trials, economics, and biology. One straightforward way to address the missing data problem is first to impute the data using off-the-shelf imputation methods and then apply existing causal discovery methods. However, such a two-step method may suffer from suboptimality, as the imputation algorithm is unaware of the causal discovery step. In this paper, we develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations. Focusing mainly on the assumptions of ignorable missingness and the identifiable additive noise models (ANMs), MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization (EM) framework. In the E-step, in cases where computing the posterior distributions of parameters in closed-form is not feasible, Monte Carlo EM is leveraged to approximate the likelihood. In the M-step, MissDAG leverages the density transformation to model the noise distributions with simpler and specific formulations by virtue of the ANMs and uses a likelihood-based causal discovery algorithm with directed acyclic graph prior as an inductive bias. We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.
翻译:然而,在本文中,我们开发了一种总的方法,我们称之为MissDAG,用不完整的观察来从数据中进行因果关系的发现。MissDAG主要侧重于可忽略的缺失和可识别的添加噪音模型(ANMS)的假设。MissDAG将利用现成估算方法对数据进行估算的预期可能性最大化,然后运用现有的因果发现方法。然而,在E阶段中,这种两步方法可能受到不优化的影响,因为估算算法没有意识到因果发现步骤。在本文中,我们称之为MissDAG的一般方法,用不完整的观察来从数据中进行因果发现。MissDAG主要侧重于可忽略的缺失和可识别的添加噪音模型(ANMS),MissDAG在预期-最大程度上利用现成部分观测结果的预期可能性。在E阶段中,在计算封闭式参数的后表分布时,将利用基于MissDAG的密度变异性模型进行密度变异性模型的模型,同时以更简单和具体的因果性演算法的方式,在前的推算中,将AUDMIS进进进进进进进进式分析中,以展示中,以更精确进进进进进进进进进进进进进进进进进进进进进进进进进进进进进进进式的进进进进进进进进进进进式的进进进进进进进的进式的进式的进式的进进进式的进式的进式的进式的进进制。