Object goal visual navigation is a challenging task that aims to guide a robot to find the target object based on its visual observation, and the target is limited to the classes pre-defined in the training stage. However, in real households, there may exist numerous target classes that the robot needs to deal with, and it is hard for all of these classes to be contained in the training stage. To address this challenge, we study the zero-shot object goal visual navigation task, which aims at guiding robots to find targets belonging to novel classes without any training samples. To this end, we also propose a novel zero-shot object navigation framework called semantic similarity network (SSNet). Our framework use the detection results and the cosine similarity between semantic word embeddings as input. Such type of input data has a weak correlation with classes and thus our framework has the ability to generalize the policy to novel classes. Extensive experiments on the AI2-THOR platform show that our model outperforms the baseline models in the zero-shot object navigation task, which proves the generalization ability of our model. Our code is available at: https://github.com/pioneer-innovation/Zero-Shot-Object-Navigation.
翻译:目标视觉导航是一项具有挑战性的任务,目的是指导机器人根据其视觉观测找到目标对象,而目标仅限于培训阶段预先界定的等级。然而,在实际家庭里,机器人可能需要处理许多目标类别,而所有这些类别都很难包含在培训阶段。为了应对这一挑战,我们研究了零射物体目标视觉导航任务,目的是指导机器人找到属于没有任何训练样品的新类的目标。为此,我们还提议了一个称为语义相似性网络(SSNet)的零射线物体导航新框架。我们的框架使用探测结果和语义嵌入的相似性。这种输入数据类型与各个类别的相关性薄弱,因此我们的框架有能力将政策推广到新类。在 AI2-THAOR平台上进行的广泛实验表明,我们的模型超越了零弹射物体导航任务中的基线模型,这证明了我们模型的通用能力。我们的代码可以查到: https://github.com/piognovementation: https://gis-nabibub-Zinnerationation.