LoGoPlanner：基于定位的导航策略与度量感知视觉几何 (LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry)

Trajectory planning in unstructured environments is a fundamental and challenging capability for mobile robots. Traditional modular pipelines suffer from latency and cascading errors across perception, localization, mapping, and planning modules. Recent end-to-end learning methods map raw visual observations directly to control signals or trajectories, promising greater performance and efficiency in open-world settings. However, most prior end-to-end approaches still rely on separate localization modules that depend on accurate sensor extrinsic calibration for self-state estimation, thereby limiting generalization across embodiments and environments. We introduce LoGoPlanner, a localization-grounded, end-to-end navigation framework that addresses these limitations by: (1) finetuning a long-horizon visual-geometry backbone to ground predictions with absolute metric scale, thereby providing implicit state estimation for accurate localization; (2) reconstructing surrounding scene geometry from historical observations to supply dense, fine-grained environmental awareness for reliable obstacle avoidance; and (3) conditioning the policy on implicit geometry bootstrapped by the aforementioned auxiliary tasks, thereby reducing error propagation.We evaluate LoGoPlanner in both simulation and real-world settings, where its fully end-to-end design reduces cumulative error while metric-aware geometry memory enhances planning consistency and obstacle avoidance, leading to more than a 27.3\% improvement over oracle-localization baselines and strong generalization across embodiments and environments. The code and models have been made publicly available on the \href{https://steinate.github.io/logoplanner.github.io/}{project page}.

翻译：在非结构化环境中的轨迹规划是移动机器人的一项基本且具有挑战性的能力。传统的模块化流水线在感知、定位、建图和规划模块之间存在延迟和级联误差。最近的端到端学习方法将原始视觉观测直接映射到控制信号或轨迹，有望在开放世界环境中实现更高的性能和效率。然而，大多数先前的端到端方法仍然依赖于独立的定位模块，这些模块需要精确的传感器外参标定以进行自身状态估计，从而限制了在不同机器人本体和环境间的泛化能力。我们提出了LoGoPlanner，一个基于定位的端到端导航框架，通过以下方式解决这些局限性：(1) 微调一个长时程视觉几何骨干网络，使其预测基于绝对度量尺度，从而为精确定位提供隐式状态估计；(2) 从历史观测中重建周围场景几何，为可靠的避障提供密集、细粒度的环境感知；(3) 将策略建立在由上述辅助任务引导的隐式几何信息之上，从而减少误差传播。我们在仿真和真实世界环境中评估了LoGoPlanner，其完全端到端的设计减少了累积误差，同时度量感知的几何记忆增强了规划一致性和避障能力，相比依赖理想定位的基线方法性能提升超过27.3%，并在不同机器人本体和环境间展现出强大的泛化能力。代码和模型已在\href{https://steinate.github.io/logoplanner.github.io/}{项目页面}公开。