从多重不完整数据源查明因果效果:一般搜索方法 (Causal Effect Identification from Multiple Incomplete Data Sources: A General Search-based Approach)

Causal effect identification considers whether an interventional probability distribution can be uniquely determined without parametric assumptions from measured source distributions and structural knowledge on the generating system. While complete graphical criteria and procedures exist for many identification problems, there are still challenging but important extensions that have not been considered in the literature. To tackle these new settings, we present a search algorithm directly over the rules of do-calculus. Due to generality of do-calculus, the search is capable of taking more advanced data-generating mechanisms into account along with an arbitrary type of both observational and experimental source distributions. The search is enhanced via a heuristic and search space reduction techniques. The approach, called do-search, is provably sound, and it is complete with respect to identifiability problems that have been shown to be completely characterized by do-calculus. When extended with additional rules, the search is capable of handling missing data problems as well. With the versatile search, we are able to approach new problems such as combined transportability and selection bias, or multiple sources of selection bias. We perform a systematic analysis of bivariate missing data problems and study causal inference under case-control design. We also present the R package dosearch that provides an interface for a C++ implementation of the search.

翻译：构造效果识别考虑的是,在没有从测得的源分布和关于生成系统的结构性知识中获得的参数性假设的情况下,干预概率分布是否可以单独确定。虽然对于许多识别问题存在完整的图形标准和程序,但文献中仍没有考虑到具有挑战性但重要的扩展。要解决这些新的设置,我们直接对“量算法”规则提出搜索算法。由于“做算法”的笼统性,搜索能够考虑到更先进的数据生成机制以及任意的观测和实验源分布类型。通过超常和搜索空间减少技术加强了搜索。我们称之为“做研究”的方法非常健全,对于被证明完全以“做量算法”为特征的识别问题来说也是完整的。如果加上附加规则,搜索能够处理缺失的数据问题。通过多功能搜索,我们能够处理新的问题,如综合的可传输性和选择偏差,或多种选择源的偏差。我们对缺失的数据问题进行了系统分析,称为“做研究”的方法是可行的,在“做量算法”方面是完整的。当加上额外的规则,搜索能够处理缺失的数据问题时,我们也可以在搜索组合中进行检查。

相关内容