In this paper, we focus on the problem of efficiently locating a target object described with free-form language using a mobile robot equipped with vision sensors (e.g., an RGBD camera). Conventional active visual search predefines a set of objects to search for, rendering these techniques restrictive in practice. To provide added flexibility in active visual searching, we propose a system where a user can enter target commands using free-form language; we call this system Active Visual Search in the Wild (AVSW). AVSW detects and plans to search for a target object inputted by a user through a semantic grid map represented by static landmarks (e.g., desk or bed). For efficient planning of object search patterns, AVSW considers commonsense knowledge-based co-occurrence and predictive uncertainty while deciding which landmarks to visit first. We validate the proposed method with respect to SR (success rate) and SPL (success weighted by path length) in both simulated and real-world environments. The proposed method outperforms previous methods in terms of SPL in simulated scenarios with an average gap of 0.283. We further demonstrate AVSW with a Pioneer-3AT robot in real-world studies.
翻译:在本文中,我们侧重于使用配备了视觉传感器(例如RGBD相机)的移动机器人,有效定位使用自由格式语言描述的目标物体的问题。常规主动视觉搜索预知了一系列要搜索的物体,使这些技术在实践中具有限制性。为了在主动视觉搜索中增加灵活性,我们提议了一个系统,用户可以使用自由格式语言输入目标命令;我们称之为“野生的主动视觉搜索”(AWSW)系统。AVSW探测用户通过静态地标(例如,台或床)代表的语义网格图输入的目标物体并计划搜索。为了有效规划物体搜索模式,AVSW认为,在确定首先访问的标志时,需要基于常识的知识共生和预测不确定性。我们验证了在模拟和现实世界环境中拟议的SR(成功率)和SPL(以路径长度加权成功率)方法。拟议方法在模拟和真实世界环境中模拟的SPL(以0.283机器人的平均差距为模型,我们进一步展示了ASWSW的先前方法。