We propose a novel architecture and training paradigm for training realistic PointGoal Navigation -- navigating to a target coordinate in an unseen environment under actuation and sensor noise without access to ground-truth localization. Specifically, we find that the primary challenge under this setting is learning localization -- when stripped of idealized localization, agents fail to stop precisely at the goal despite reliably making progress towards it. To address this we introduce a set of auxiliary losses to help the agent learn localization. Further, we explore the idea of treating the precise location of the agent as privileged information -- it is unavailable during test time, however, it is available during training time in simulation. We grant the agent restricted access to ground-truth localization readings during training via an information bottleneck. Under this setting, the agent incurs a penalty for using this privileged information, encouraging the agent to only leverage this information when it is crucial to learning. This enables the agent to first learn navigation and then learn localization instead of conflating these two objectives in training. We evaluate our proposed method both in a semi-idealized (noiseless simulation without Compass+GPS) and realistic (addition of noisy simulation) settings. Specifically, our method outperforms existing baselines on the semi-idealized setting by 18\%/21\% SPL/Success and by 15\%/20\% SPL in the realistic setting. Our improved Success and SPL metrics indicate our agent's improved ability to accurately self-localize while maintaining a strong navigation policy. Our implementation can be found at https://github.com/NicoGrande/habitat-pointnav-via-ib.
翻译:我们提出一个新的架构和培训模式,用于培训现实点目标导航 -- -- 在启动和感应噪音下,在无法获取地面实况本地化的情况下,在一个隐蔽环境中,在启动和感应噪音情况下,我们提出一个目标协调的新结构和培训模式。具体地说,我们发现,在这种背景下,在去除理想化本地化后,代理商未能在目标上完全停止,尽管在朝着目标方向取得可靠进展。为此,我们引入了一系列辅助损失以帮助代理商学习本地化。此外,我们探讨了将代理商的确切位置视为保密信息的想法。在测试期间,它无法在模拟培训期间提供。我们允许代理商在培训期间通过信息瓶子学习本地化时,在使用本地化的地方化时,代理商无法在目标上进行精确的本地化读取。让我们的本地化(没有精确度/GPS/正反) 和具体化的自我定位(在18度标准/SBSBS上,通过模拟我们目前的标准和标准化的自我定位,在18度标准中,可以评估我们的标准-Lx-Lx-Lx-