This work focuses on object goal visual navigation, aiming at finding the location of an object from a given class, where in each step the agent is provided with an egocentric RGB image of the scene. We propose to learn the agent's policy using a reinforcement learning algorithm. Our key contribution is a novel attention probability model for visual navigation tasks. This attention encodes semantic information about observed objects, as well as spatial information about their place. This combination of the "what" and the "where" allows the agent to navigate toward the sought-after object effectively. The attention model is shown to improve the agent's policy and to achieve state-of-the-art results on commonly-used datasets.
翻译:这项工作侧重于对象目标视觉导航, 目的是从某一类中找到对象的位置, 每一步都向代理人提供以自我为中心的 RGB 图像。 我们提议使用强化学习算法学习该代理人的政策 。 我们的主要贡献是视觉导航任务的新式关注概率模型 。 注意将所观测对象的语义信息及其位置的空间信息编码为“ 什么” 和“ 何处” 相结合, 使代理人能够有效地向寻找对象导航。 注意模型显示改善代理人的政策, 并实现常用数据集的最新结果 。