3D Gaussian Splatting (3DGS), a 3D representation method with photorealistic real-time rendering capabilities, is regarded as an effective tool for narrowing the sim-to-real gap. However, it lacks fine-grained semantics and physical executability for Visual-Language Navigation (VLN). To address this, we propose SAGE-3D (Semantically and Physically Aligned Gaussian Environments for 3D Navigation), a new paradigm that upgrades 3DGS into an executable, semantically and physically aligned environment. It comprises two components: (1) Object-Centric Semantic Grounding, which adds object-level fine-grained annotations to 3DGS; and (2) Physics-Aware Execution Jointing, which embeds collision objects into 3DGS and constructs rich physical interfaces. We release InteriorGS, containing 1K object-annotated 3DGS indoor scene data, and introduce SAGE-Bench, the first 3DGS-based VLN benchmark with 2M VLN data. Experiments show that 3DGS scene data is more difficult to converge, while exhibiting strong generalizability, improving baseline performance by 31% on the VLN-CE Unseen task. Our data and code are available at: https://sage-3d.github.io.
翻译:3D高斯泼溅(3DGS)作为一种具备照片级真实感实时渲染能力的三维表示方法,被视为缩小仿真与现实差距的有效工具。然而,其在视觉语言导航(VLN)任务中缺乏细粒度语义理解与物理可执行性。为解决这一问题,我们提出SAGE-3D(面向三维导航的语义与物理对齐高斯环境),这是一种将3DGS升级为可执行、语义与物理对齐环境的新范式。该框架包含两个核心组件:(1)面向对象的语义标注,为3DGS添加物体级细粒度标注;(2)物理感知执行对接,将碰撞物体嵌入3DGS并构建丰富的物理交互接口。我们发布了包含1K个带物体标注的3DGS室内场景数据集InteriorGS,并推出了首个基于3DGS的VLN基准测试SAGE-Bench,涵盖200万条VLN数据。实验表明,3DGS场景数据虽收敛难度更高,但展现出强大的泛化能力,在VLN-CE未见任务上将基线性能提升了31%。我们的数据与代码已开源:https://sage-3d.github.io。