The goal of causal representation learning is to find a representation of data that consists of causally related latent variables. We consider a setup where one has access to data from multiple domains that potentially share a causal representation. Crucially, observations in different domains are assumed to be unpaired, that is, we only observe the marginal distribution in each domain but not their joint distribution. In this paper, we give sufficient conditions for identifiability of the joint distribution and the shared causal graph in a linear setup. Identifiability holds if we can uniquely recover the joint distribution and the shared causal representation from the marginal distributions in each domain. We transform our identifiability results into a practical method to recover the shared latent causal graph. Moreover, we study how multiple domains reduce errors in falsely detecting shared causal variables in the finite data setting.
翻译:因果代表性学习的目的是找到由因果相关的潜在变量构成的数据的表示方式。 我们考虑一个可以获取来自多个领域且可能具有因果代表性的数据的设置。 关键是,不同领域的观测假设是不可靠的, 也就是说, 我们只观察每个领域的边际分布, 而不是它们的联合分布。 在本文中, 我们给联合分布和线性结构中共有因果图的可识别性提供了充分的条件。 如果我们能够从每个领域的边际分布中单独恢复联合分布和共有因果代表性, 则存在可识别性。 我们将我们的可识别性结果转化为一种实际方法, 以恢复共同的潜在因果图。 此外, 我们研究多领域如何减少错误, 错误地发现有限数据环境中的共有因果变量 。