Object goal visual navigation is a challenging task that aims to guide a robot to find the target object based on its visual observation, and the target is limited to the classes specified in the training stage. However, in real households, there may exist numerous object classes that the robot needs to deal with, and it is hard for all of these classes to be contained in the training stage. To address this challenge, we propose a task named zero-shot object navigation, which aims at guiding robots to find objects belonging to novel classes without any training samples. To this end, we also propose a novel zero-shot object navigation framework. Our framework use the detection results and the cosine similarity between semantic word embeddings as input. Such type of input data has a weak correlation with classes and thus our framework has the ability to generalize the policy to novel classes. Extensive experiments on the AI2-THOR framework show that our model outperforms the baseline models in the zero-shot object navigation task, which proves the the generalization ability of our model. Our code is available at: https://github.com/pioneer-innovation/Zero-Shot-Object-Navigation.
翻译:目标视觉导航是一项具有挑战性的任务,目的是指导机器人根据其视觉观测找到目标对象,而目标仅限于培训阶段指定的类别。然而,在实际住户中,机器人可能需要处理许多对象类别,而所有这些类别都很难包含在培训阶段。为了应对这一挑战,我们提议了一个名为零射物体导航的任务,目的是指导机器人找到属于新类的物体,而没有任何训练样品。为此,我们还提议了一个新的零射物体导航框架。我们的框架使用探测结果和语义嵌入词的共性相似性作为输入。这类输入数据类型与课程的相关性较弱,因此我们的框架有能力将政策概括到新类中。在 AI2- THEOR框架上进行的广泛实验显示,我们的模型超越了零射物体导航任务中的基线模型,这证明了我们模型的普遍化能力。我们的代码可以在 https://github.comone/pioner-innovation/Zhoero-ShotOation上查到: https://gister-nove-obation/Zero-Oation。