Causal discovery and causal effect estimation are two fundamental tasks in causal inference. While many methods have been developed for each task individually, statistical challenges arise when applying these methods jointly: estimating causal effects after running causal discovery algorithms on the same data leads to "double dipping," invalidating the coverage guarantees of classical confidence intervals. To this end, we develop tools for valid post-causal-discovery inference. Across empirical studies, we show that a naive combination of causal discovery and subsequent inference algorithms leads to highly inflated miscoverage rates; on the other hand, applying our method provides reliable coverage while achieving more accurate causal discovery than data splitting.
翻译:因果发现和因果效应估计是因果推断中两个基本的任务。虽然为每个任务单独开发了许多方法,但当同时将这些方法应用于同一数据时会出现统计挑战:在运行因果发现算法并在同一数据上进行因果效应估计后,会导致“双重使用”,从而使经典置信区间的覆盖保证无效。为此,我们开发了一些用于进行有效因果发现后推断的工具。通过实证研究,我们发现,单纯地组合因果发现和随后的推断算法会导致高度夸张的误覆盖率;另一方面,应用我们的方法可以提供可靠的覆盖,同时比数据分割实现更准确的因果发现。