Object goal visual navigation is a challenging task that aims to guide a robot to find the target object only based on its visual observation, and the target is limited to the classes specified in the training stage. However, in real households, there may exist numerous object classes that the robot needs to deal with, and it is hard for all of these classes to be contained in the training stage. To address this challenge, we propose a zero-shot object navigation task by combining zero-shot learning with object goal visual navigation, which aims at guiding robots to find objects belonging to novel classes without any training samples. This task gives rise to the need to generalize the learned policy to novel classes, which is a less addressed issue of object navigation using deep reinforcement learning. To address this issue, we utilize "class-unrelated" data as input to alleviate the overfitting of the classes specified in the training stage. The class-unrelated input consists of detection results and cosine similarity of word embeddings, and does not contain any class-related visual features or knowledge graphs. Extensive experiments on the AI2-THOR platform show that our model outperforms the baseline models in both seen and unseen classes, which proves that our model is less class-sensitive and generalizes better. Our code is available at https://github.com/pioneer-innovation/Zero-Shot-Object-Navigation
翻译:视觉目标导航是一项具有挑战性的任务,它旨在引导机器人仅根据其视觉观测找到目标对象,而目标仅限于培训阶段指定的课程。然而,在实际住户中,机器人可能需要处理的物体类别可能很多,而所有这些类别都很难包含在培训阶段。为了应对这一挑战,我们提议了一个零射物体导航任务,将零射学习与对象目标视觉导航结合起来,目的是引导机器人在没有任何培训样本的情况下找到属于新类的物体。这一任务导致有必要将所学的政策推广到新类,而新类是利用深层强化学习处理较少的物体导航问题。为了解决这个问题,我们使用“与阶级无关”的数据作为投入,以缓解培训阶段指定的课程的过度匹配。与阶级无关的投入包括检测结果和词汇嵌入相似性,并且不包含任何与阶级相关的视觉特征或知识图表。在 AI2-THOOR平台上的广泛实验显示,我们的模型比常规模型要优于N型模型,而在我们班级/秘密课程中,我们看到的是更差的。