Realistic long-horizon tasks like image-goal navigation involve exploratory and exploitative phases. Assigned with an image of the goal, an embodied agent must explore to discover the goal, i.e., search efficiently using learned priors. Once the goal is discovered, the agent must accurately calibrate the last-mile of navigation to the goal. As with any robust system, switches between exploratory goal discovery and exploitative last-mile navigation enable better recovery from errors. Following these intuitive guide rails, we propose SLING to improve the performance of existing image-goal navigation systems. Entirely complementing prior methods, we focus on last-mile navigation and leverage the underlying geometric structure of the problem with neural descriptors. With simple but effective switches, we can easily connect SLING with heuristic, reinforcement learning, and neural modular policies. On a standardized image-goal navigation benchmark (Hahn et al. 2021), we improve performance across policies, scenes, and episode complexity, raising the state-of-the-art from 45% to 55% success rate. Beyond photorealistic simulation, we conduct real-robot experiments in three physical scenes and find these improvements to transfer well to real environments.
翻译:图像目标导航等现实长视线任务包含探索性和剥削性阶段。 带有目标图像的配置, 一个体现代理人必须探索以发现目标, 即使用所学前科进行高效搜索。 一旦发现目标, 代理人必须准确校准最后一英里导航到目标。 和任何强大的系统一样, 探索目标发现和剥削性最后一英里导航之间的开关可以更好地从错误中恢复。 遵循这些直观的引导轨迹, 我们建议 SLing 改进现有图像目标导航系统的性能。 全面补充先前的方法, 我们侧重于最后一英里导航, 并利用神经描述器来利用问题的基本几何结构。 使用简单而有效的开关, 我们很容易将闪烁与超导、 强化学习和神经模块政策连接起来。 在标准化的图像- 目标导航基准( Hahn 等人 2021 ) 上, 我们改进了政策、 场景和事件复杂性的性能, 提高现有图像目标导航系统的性能, 从45% 提高到55% 成功率 。 除了摄影现实模拟之外, 我们进行实时实验, 在三个物理场景环境中找到这些改进 。