Recent years have seen a surge of interest in learning high-level causal representations from low-level image pairs under interventions. Yet, existing efforts are largely limited to simple synthetic settings that are far away from real-world problems. In this paper, we present Causal Triplet, a causal representation learning benchmark featuring not only visually more complex scenes, but also two crucial desiderata commonly overlooked in previous works: (i) an actionable counterfactual setting, where only certain object-level variables allow for counterfactual observations whereas others do not; (ii) an interventional downstream task with an emphasis on out-of-distribution robustness from the independent causal mechanisms principle. Through extensive experiments, we find that models built with the knowledge of disentangled or object-centric representations significantly outperform their distributed counterparts. However, recent causal representation learning methods still struggle to identify such latent structures, indicating substantial challenges and opportunities for future work. Our code and datasets will be available at https://sites.google.com/view/causaltriplet.
翻译:近年来,从干预中低级图像对中学习高级因果表示引起了广泛关注。然而,现有的努力在很大程度上局限于远离实际问题的简单合成设置。在本文中,我们提出了因果三元组,这是一个因果表示学习基准,不仅具有更复杂的视觉场景,而且还具有之前工作常常忽略的两个关键愿望:(i)可操作的反事实设置,其中仅某些对象级变量允许反事实观察,而其他变量则不允许。(ii)注重独立因果机制原理的干预下游任务并具有分布鲁棒性。通过广泛的实验,我们发现基于分解或对象为中心的表示模型显著优于分布式模型。然而,最近的因果表示学习方法仍然难以识别这种潜在结构,这表明未来工作面临着巨大的挑战和机遇。我们的代码和数据集将在https://sites.google.com/view/causaltriplet上提供。