Reasoning about the effect of interventions and counterfactuals is a fundamental task found throughout the data sciences. A collection of principles, algorithms, and tools has been developed for performing such tasks in the last decades (Pearl, 2000). One of the pervasive requirements found throughout this literature is the articulation of assumptions, which commonly appear in the form of causal diagrams. Despite the power of this approach, there are significant settings where the knowledge necessary to specify a causal diagram over all variables is not available, particularly in complex, high-dimensional domains. In this paper, we introduce a new graphical modeling tool called cluster DAGs (for short, C-DAGs) that allows for the partial specification of relationships among variables based on limited prior knowledge, alleviating the stringent requirement of specifying a full causal diagram. A C-DAG specifies relationships between clusters of variables, while the relationships between the variables within a cluster are left unspecified, and can be seen as a graphical representation of an equivalence class of causal diagrams that share the relationships among the clusters. We develop the foundations and machinery for valid inferences over C-DAGs about the clusters of variables at each layer of Pearl's Causal Hierarchy (Pearl and Mackenzie 2018; Bareinboim et al. 2020) - L1 (probabilistic), L2 (interventional), and L3 (counterfactual). In particular, we prove the soundness and completeness of d-separation for probabilistic inference in C-DAGs. Further, we demonstrate the validity of Pearl's do-calculus rules over C-DAGs and show that the standard ID identification algorithm is sound and complete to systematically compute causal effects from observational data given a C-DAG. Finally, we show that C-DAGs are valid for performing counterfactual inferences about clusters of variables.
翻译:有关干预和反事实作用的理论是整个数据科学中发现的一项基本任务。 在过去几十年中,已经为完成这些任务开发了一套原则、算法和工具(Pearl, 2000年)。 文献中发现的一个普遍要求是假设的表达,这些假设通常以因果图的形式出现。 尽管这种方法的力量很大, 但仍有大量的设置, 无法为所有变量指定一个因果图表, 特别是在复杂、 高维域中。 在本文中, 我们引入了一个新的图形化算法工具, 称为 DAGs( 简称, C- DAGs), 以便能够根据有限的先前知识对变量之间的关系进行部分的描述, 减轻指定完整因果图的严格要求。 A- DAGs 指定了变量组之间的关系, 而一个组内的变量之间的关系则没有说明, 并且可以被视为一个对等因果图表类的表示, 共享各组关系。 我们为 C- DAGs( 短期) 、 C- Dality 和 Balbillioral- 的 Cal- deal- dalalalal- dalalalal 数据, 和 Breal- cal- darvial- 显示。