In this paper, we focus on the problem of efficiently locating a target object described with free-form language using a mobile robot equipped with vision sensors (e.g., an RGBD camera). Conventional active visual search predefines a set of objects to search for, rendering these techniques restrictive in practice. To provide added flexibility in active visual searching, we propose a system where a user can enter target commands using free-form language; we call this system Zero-shot Active Visual Search (ZAVIS). ZAVIS detects and plans to search for a target object inputted by a user through a semantic grid map represented by static landmarks (e.g., desk or bed). For efficient planning of object search patterns, ZAVIS considers commonsense knowledge-based co-occurrence and predictive uncertainty while deciding which landmarks to visit first. We validate the proposed method with respect to SR (success rate) and SPL (success weighted by path length) in both simulated and real-world environments. The proposed method outperforms previous methods in terms of SPL in simulated scenarios with an average gap of 0.283. We further demonstrate ZAVIS with a Pioneer-3AT robot in real-world studies.
翻译:在本文中,我们侧重于使用配备了视觉传感器(例如RGBD相机)的移动机器人有效定位一个使用自由格式语言描述的目标物体的问题。常规主动视觉搜索预断了一系列要搜索的物体,使这些技术在实践中具有限制性。为了在主动视觉搜索中增加灵活性,我们提议了一个系统,用户可以使用自由格式语言输入目标命令;我们称之为“零射主动视觉搜索”系统(ZAVIS)。ZAVIS探测并计划搜索一个用户通过静态地标(例如台或床)所代表的语义网格图输入的目标物体。为了有效规划目标搜索模式,ZAVIS认为,在确定首先访问的标志时,可以使用基于知识的常见共发率和预测不确定性。我们验证了在模拟和现实世界环境中拟议的SR(成功率)和SPL(以路径长度加权的成功率)方法。拟议方法在模拟和现实世界环境中的SPL假设情景(例如:台式AIS)中,我们进一步用模拟空间导航卫星系统系统的平均差距来显示空间卫星系统模拟空间卫星系统。