State-of-the-art causal discovery methods usually assume that the observational data is complete. However, the missing data problem is pervasive in many practical scenarios such as clinical trials, economics, and biology. One straightforward way to address the missing data problem is first to impute the data using off-the-shelf imputation methods and then apply existing causal discovery methods. However, such a two-step method may suffer from suboptimality, as the imputation algorithm may introduce bias for modeling the underlying data distribution. In this paper, we develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations. Focusing mainly on the assumptions of ignorable missingness and the identifiable additive noise models (ANMs), MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization (EM) framework. In the E-step, in cases where computing the posterior distributions of parameters in closed-form is not feasible, Monte Carlo EM is leveraged to approximate the likelihood. In the M-step, MissDAG leverages the density transformation to model the noise distributions with simpler and specific formulations by virtue of the ANMs and uses a likelihood-based causal discovery algorithm with directed acyclic graph constraint. We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.
翻译:然而,在本文中,我们开发了一种总的方法,我们称之为MissDAG,用不完整的观察来从数据中进行因果关系的发现。MissDAG主要利用可忽略的缺失和可识别的添加噪声模型(ANMS)的假设,主要侧重于可识别的添加噪声模型(ANMS)的假设,MissDAG将预期观测的可见部分在预期-最大度框架(EM)下的可能性最大化最大化。在E步骤中,如果计算封闭式参数的后方分布值不可行,那么在计算封闭式参数的后方分布值时,这种两步法可能会受到影响。在本文中,我们称之为MissDAG, 以不完全的观察方式从数据中进行因果关系的发现。MissDAG利用可忽略的忽略的缺失和可识别的添加噪声模型(ANMS)的噪音模型和可识别的添加噪声模型(ANMS),最大限度地提高预期在预期-最大程度上在预期-最大程度上在预期-最大程度上通过AMSA的概率分析中,以简单和具体的方式展示其真实的振动性演算法。