迈向物理可执行的3D高斯模型用于具身导航 (Towards Physically Executable 3D Gaussian for Embodied Navigation)

3D Gaussian Splatting (3DGS), a 3D representation method with photorealistic real-time rendering capabilities, is regarded as an effective tool for narrowing the sim-to-real gap. However, it lacks fine-grained semantics and physical executability for Visual-Language Navigation (VLN). To address this, we propose SAGE-3D (Semantically and Physically Aligned Gaussian Environments for 3D Navigation), a new paradigm that upgrades 3DGS into an executable, semantically and physically aligned environment. It comprises two components: (1) Object-Centric Semantic Grounding, which adds object-level fine-grained annotations to 3DGS; and (2) Physics-Aware Execution Jointing, which embeds collision objects into 3DGS and constructs rich physical interfaces. We release InteriorGS, containing 1K object-annotated 3DGS indoor scene data, and introduce SAGE-Bench, the first 3DGS-based VLN benchmark with 2M VLN data. Experiments show that 3DGS scene data is more difficult to converge, while exhibiting strong generalizability, improving baseline performance by 31% on the VLN-CE Unseen task. Our data and code are available at: https://sage-3d.github.io.

翻译：3D高斯泼溅（3DGS）作为一种具备照片级真实感实时渲染能力的三维表示方法，被视为缩小仿真与现实差距的有效工具。然而，其在视觉语言导航（VLN）任务中缺乏细粒度语义理解与物理可执行性。为解决这一问题，我们提出SAGE-3D（面向三维导航的语义与物理对齐高斯环境），这是一种将3DGS升级为可执行、语义与物理对齐环境的新范式。该框架包含两个核心组件：（1）面向对象的语义标注，为3DGS添加物体级细粒度标注；（2）物理感知执行对接，将碰撞物体嵌入3DGS并构建丰富的物理交互接口。我们发布了包含1K个带物体标注的3DGS室内场景数据集InteriorGS，并推出了首个基于3DGS的VLN基准测试SAGE-Bench，涵盖200万条VLN数据。实验表明，3DGS场景数据虽收敛难度更高，但展现出强大的泛化能力，在VLN-CE未见任务上将基线性能提升了31%。我们的数据与代码已开源：https://sage-3d.github.io。