Recent years have seen a surge of interest in learning high-level causal representations from low-level image pairs under interventions. Yet, existing efforts are largely limited to simple synthetic settings that are far away from real-world problems. In this paper, we present Causal Triplet, a causal representation learning benchmark featuring not only visually more complex scenes, but also two crucial desiderata commonly overlooked in previous works: (i) an actionable counterfactual setting, where only certain object-level variables allow for counterfactual observations whereas others do not; (ii) an interventional downstream task with an emphasis on out-of-distribution robustness from the independent causal mechanisms principle. Through extensive experiments, we find that models built with the knowledge of disentangled or object-centric representations significantly outperform their distributed counterparts. However, recent causal representation learning methods still struggle to identify such latent structures, indicating substantial challenges and opportunities for future work. Our code and datasets will be available at https://sites.google.com/view/causaltriplet.
翻译:近些年来,人们对从低层图像组合中了解高层次因果表现的兴趣激增,然而,现有的努力主要限于简单的合成环境,远离现实世界的问题。本文介绍了一个因果表现学习基准Causal Triplet,它不仅以视觉上更复杂的场景为主,而且还有两个在以往工作中普遍被忽视的关键分层:(一) 可采取行动的反事实环境,其中只有某些目标层面的变量允许反事实观察,而其他变量则不允许;(二) 干预性下游任务,重点是独立因果机制原则的超出分配的稳健性。通过广泛的实验,我们发现,以分解或以目标为中心的表现知识建立的模型大大超越了分布的对应方。然而,最近的因果表现学习方法仍然在努力确定这种潜在结构,表明未来工作的巨大挑战和机遇。我们的代码和数据集将在https://sites.google.com/view/causaltriplet上查阅。