We study the problem of reconstructing a causal graphical model from data in the presence of latent variables. The main problem of interest is recovering the causal structure over the latent variables while allowing for general, potentially nonlinear dependence between the variables. In many practical problems, the dependence between raw observations (e.g. pixels in an image) is much less relevant than the dependence between certain high-level, latent features (e.g. concepts or objects), and this is the setting of interest. We provide conditions under which both the latent representations and the underlying latent causal model are identifiable by a reduction to a mixture oracle. These results highlight an intriguing connection between the well-studied problem of learning the order of a mixture model and the problem of learning the bipartite structure between observables and unobservables. The proof is constructive, and leads to several algorithms for explicitly reconstructing the full graphical model. We discuss efficient algorithms and provide experiments illustrating the algorithms in practice.
翻译:我们从存在潜在变量的情况下的数据中研究因果图形模型的重建问题。主要的兴趣问题在于恢复潜在变量的因果结构,同时允许变量之间的一般、潜在的非线性依赖。在许多实际问题中,原始观测(如图像中的像素)之间的依赖性远不如某些高层次、潜在特征(如概念或对象)之间的依赖性,而这就是利息的设定。我们提供各种条件,使潜在表达和潜在潜在因果关系模型能够通过减少混合质谱来识别。这些结果突显了深思熟虑的混合模型的顺序问题与可观测和不可观测的两种结构的学习问题之间的令人感兴趣的联系。证据是建设性的,并引出若干算法,以明确重建完整的图形模型。我们讨论高效的算法,并提供实验,说明实践中的算法。