因果宇宙：基于可配置高保真仿真的因果表征学习基准测试 (CausalVerse: Benchmarking Causal Representation Learning with Configurable High-Fidelity Simulations)

Causal Representation Learning (CRL) aims to uncover the data-generating process and identify the underlying causal variables and relations, whose evaluation remains inherently challenging due to the requirement of known ground-truth causal variables and causal structure. Existing evaluations often rely on either simplistic synthetic datasets or downstream performance on real-world tasks, generally suffering a dilemma between realism and evaluative precision. In this paper, we introduce a new benchmark for CRL using high-fidelity simulated visual data that retains both realistic visual complexity and, more importantly, access to ground-truth causal generating processes. The dataset comprises around 200 thousand images and 3 million video frames across 24 sub-scenes in four domains: static image generation, dynamic physical simulations, robotic manipulations, and traffic situation analysis. These scenarios range from static to dynamic settings, simple to complex structures, and single to multi-agent interactions, offering a comprehensive testbed that hopefully bridges the gap between rigorous evaluation and real-world applicability. In addition, we provide flexible access to the underlying causal structures, allowing users to modify or configure them to align with the required assumptions in CRL, such as available domain labels, temporal dependencies, or intervention histories. Leveraging this benchmark, we evaluated representative CRL methods across diverse paradigms and offered empirical insights to assist practitioners and newcomers in choosing or extending appropriate CRL frameworks to properly address specific types of real problems that can benefit from the CRL perspective. Welcome to visit our: Project page:https://causal-verse.github.io/, Dataset:https://huggingface.co/CausalVerse.

翻译：因果表征学习旨在揭示数据生成过程并识别潜在的因果变量与关系，其评估因需要已知真实因果变量与因果结构而始终面临固有挑战。现有评估通常依赖于简化的合成数据集或现实任务的下游性能，普遍面临真实性与评估精度之间的两难困境。本文提出一种基于高保真仿真视觉数据的新型因果表征学习基准，该数据既保持真实的视觉复杂性，更重要的是提供了真实因果生成过程的访问权限。数据集涵盖四个领域中的24个子场景，包含约20万张静态图像和300万视频帧：静态图像生成、动态物理仿真、机器人操作与交通态势分析。这些场景覆盖从静态到动态设置、从简单到复杂结构、从单智能体到多智能体交互的完整谱系，构建了一个有望弥合严格评估与现实应用性之间鸿沟的综合测试平台。此外，我们提供对底层因果结构的灵活访问机制，允许用户根据因果表征学习所需假设（如可用领域标签、时间依赖性或干预历史）修改或配置因果结构。基于此基准，我们评估了跨多种范式的代表性因果表征学习方法，并提供实证见解以帮助从业者与新研究人员选择或扩展合适的因果表征学习框架，从而有效解决那些能从因果视角受益的特定现实问题。欢迎访问：项目主页：https://causal-verse.github.io/，数据集：https://huggingface.co/CausalVerse。