In this work, we present a memory-augmented approach for image-goal navigation. Earlier attempts, including RL-based and SLAM-based approaches have either shown poor generalization performance, or are heavily-reliant on pose/depth sensors. Our method is based on an attention-based end-to-end model that leverages an episodic memory to learn to navigate. First, we train a state-embedding network in a self-supervised fashion, and then use it to embed previously-visited states into the agent's memory. Our navigation policy takes advantage of this information through an attention mechanism. We validate our approach with extensive evaluations, and show that our model establishes a new state of the art on the challenging Gibson dataset. Furthermore, we achieve this impressive performance from RGB input alone, without access to additional information such as position or depth, in stark contrast to related work.
翻译:在这项工作中,我们展示了图像目标导航的记忆强化方法。早期的尝试,包括基于RL和基于SLAM的尝试,要么表现不佳,要么严重依赖表面/深度传感器。我们的方法是基于基于关注的端对端模型,该模型利用偶发记忆来学习导航。首先,我们以自我监督的方式培训一个州组成的网络,然后用它将先前访问过的国家嵌入代理人的记忆中。我们的导航政策通过关注机制利用了这些信息。我们通过广泛的评估验证了我们的方法,并表明我们的模型在挑战性的吉布森数据集上建立了新的艺术状态。此外,我们通过光靠 RGB 输入而实现这一令人印象深刻的绩效,而没有获得位置或深度等额外信息,与相关工作形成鲜明对比。