Learning causal structure from observational data often assumes that we observe independent and identically distributed (i.\,i.\,d) data. The traditional approach aims to find a graphical representation that encodes the same set of conditional independence relationships as those present in the observed distribution. It is known that under i.\,i.\,d assumption, even with infinite data, there is a limit to how fine-grained a causal structure we can identify. To overcome this limitation, recent work has explored using data originating from different, related environments to learn richer causal structure. These approaches implicitly rely on the independent causal mechanisms (ICM) principle, which postulates that the mechanism giving rise to an effect given its causes and the mechanism which generates the causes do not inform or influence each other. Thus, components of the causal model can independently change from environment to environment. Despite its wide application in machine learning and causal inference, there is a lack of statistical formalization of the ICM principle and how it enables identification of richer causal structures from grouped data. Here we present new causal de Finetti theorems which offer a first statistical formalization of ICM principle and show how causal structure identification is possible from exchangeable data. Our work provides theoretical justification for a broad range of techniques leveraging multi-environment data to learn causal structure.
翻译:从观测数据得出的学习因果结构往往假设我们观察的是独立和同样分布的数据(即\,i\,i\,d) 传统方法旨在找到一种图形化的表示方式,该表示方式编码了与观察分布中存在的相同一系列有条件的独立关系,已知根据i\,i\,d假设,即使有无限的数据,我们仍然可以确定一个细微的因果结构。为了克服这一限制,最近的工作探索了如何利用来自不同、相关环境的数据来学习更丰富的因果结构。这些方法隐含地依赖独立因果机制原则,该原则假定产生效果的机制基于其原因,而产生原因的机制并不相互通报或影响。因此,在i\,i\,i\,d 假设,即使有无限的数据假设,在机器学习和因果推断中广泛应用了因果结构,但在统计上缺乏正式的内分解原则,以及如何从分组数据中找出更丰富的因果结构。我们在这里提出了新的因果原则,从第一次统计因果机制正式化提供了内分解的因果原则,而产生因果关系的机制并不相互影响。因此产生相互影响。因此,因此,因果关系模式的组成部分可以从环境变化结构如何利用。