Causal representation learning seeks to extract high-level latent factors from low-level sensory data. Most existing methods rely on observational data and structural assumptions (e.g., conditional independence) to identify the latent factors. However, interventional data is prevalent across applications. Can interventional data facilitate causal representation learning? We explore this question in this paper. The key observation is that interventional data often carries geometric signatures of the latent factors' support (i.e. what values each latent can possibly take). For example, when the latent factors are causally connected, interventions can break the dependency between the intervened latents' support and their ancestors'. Leveraging this fact, we prove that the latent causal factors can be identified up to permutation and scaling given data from perfect $do$ interventions. Moreover, we can achieve block affine identification, namely the estimated latent factors are only entangled with a few other latents if we have access to data from imperfect interventions. These results highlight the unique power of interventional data in causal representation learning; they can enable provable identification of latent factors without any assumptions about their distributions or dependency structure.
翻译:原因陈述学习试图从低层次感官数据中提取高层次潜伏因素。 大多数现有方法依靠观察数据和结构假设(例如有条件独立)来确定潜在因素。 但是,干预数据在各种应用中很普遍。 干预数据能够促进因果陈述学习吗? 我们在本文件中探讨这一问题。 关键观察是,干预数据往往带有潜在因素支持的几何特征(即每个潜伏值都可能采取何种数值 ) 。 例如,当潜在因素因果联系时,干预措施可以打破被干预潜伏支持和祖先支持之间的依赖性。 利用这一事实,我们证明潜在因果因素可以被确定为变化,并扩展从完美的美元干预中获得的数据。 此外,我们可以实现阻断识别,即估计的潜在因素与其它几处潜在因素相纠缠,如果我们能够获得不完善的干预数据,这些潜在因素只会与其它几处隐伏因素相纠缠。 这些结果突出干预数据在因果陈述学习中的独特能力; 它们能够使潜在因素在不假定其分布或依赖结构的情况下被确认。