Causal discovery algorithms estimate causal graphs from observational data. This can provide a valuable complement to analyses focussing on the causal relation between individual treatment-outcome pairs. Constraint-based causal discovery algorithms rely on conditional independence testing when building the graph. Until recently, these algorithms have been unable to handle missing values. In this paper, we investigate two alternative solutions: Test-wise deletion and multiple imputation. We establish necessary and sufficient conditions for the recoverability of causal structures under test-wise deletion, and argue that multiple imputation is more challenging in the context of causal discovery than for estimation. We conduct an extensive comparison by simulating from benchmark causal graphs: As one might expect, we find that test-wise deletion and multiple imputation both clearly outperform list-wise deletion and single imputation. Crucially, our results further suggest that multiple imputation is especially useful in settings with a small number of either Gaussian or discrete variables, but when the dataset contains a mix of both neither method is uniformly best. The methods we compare include random forest imputation and a hybrid procedure combining test-wise deletion and multiple imputation. An application to data from the IDEFICS cohort study on diet- and lifestyle-related diseases in European children serves as an illustrating example.
翻译:原因发现算法从观察数据中估算因果图。 这可以提供宝贵的补充, 分析重点分析个体治疗结果对等之间因果关系的分析。 基于约束性的因果发现算法在建立图表时依赖于有条件的独立测试。 直到最近, 这些算法一直无法处理缺失的值。 在本文件中, 我们调查了两种替代解决方案: 测试性删除和多重估算。 我们为测试性删除中的因果结构的可恢复性建立了必要和充分的条件, 并争论说, 多重估算在因果发现方面比估计更具挑战性。 我们通过模拟基准因果图表进行广泛的比较: 正如人们可能预期的那样, 我们发现测试性删除和多重估算都是基于有条件的独立测试性测试性删除和单一估算。 显而易见的是, 我们的结果进一步表明, 多重估算在测试性或离散变量数量很少的情况下特别有用, 但是当数据集包含两种方法的组合是最佳的。 我们比较的方法包括随机森林估算法和混合程序, 将测试性排除和混合性程序作为测试性统计性疾病和多位研究所的模型。