For successful deployment of robots in multifaceted situations, an understanding of the robot for its environment is indispensable. With advancing performance of state-of-the-art object detectors, the capability of robots to detect objects within their interaction domain is also enhancing. However, it binds the robot to a few trained classes and prevents it from adapting to unfamiliar surroundings beyond predefined scenarios. In such scenarios, humans could assist robots amidst the overwhelming number of interaction entities and impart the requisite expertise by acting as teachers. We propose a novel pipeline that effectively harnesses human gaze and augmented reality in a human-robot collaboration context to teach a robot novel objects in its surrounding environment. By intertwining gaze (to guide the robot's attention to an object of interest) with augmented reality (to convey the respective class information) we enable the robot to quickly acquire a significant amount of automatically labeled training data on its own. Training in a transfer learning fashion, we demonstrate the robot's capability to detect recently learned objects and evaluate the influence of different machine learning models and learning procedures as well as the amount of training data involved. Our multimodal approach proves to be an efficient and natural way to teach the robot novel objects based on a few instances and allows it to detect classes for which no training dataset is available. In addition, we make our dataset publicly available to the research community, which consists of RGB and depth data, intrinsic and extrinsic camera parameters, along with regions of interest.
翻译:对于在多方面的情况下成功部署机器人来说,理解机器人对其环境的认识是不可或缺的。随着最先进的物体探测器的性能的提高,机器人在其互动域内探测物体的能力也正在提高。然而,它将机器人绑在几个经过训练的班级上,使其无法适应超出预先界定的情景之外的不熟悉的环境。在这种情景中,人类可以在绝大多数互动实体中协助机器人,并通过作为教师来传授必要的专门知识。我们提议了一个新的管道,在人类机器人协作背景下,有效地利用人类的视线和增强现实,在周围环境中教授机器人的新物体。通过相互交织的视线(引导机器人注意一个感兴趣的对象),扩大现实(传递各自的班级信息),我们使机器人能够迅速获得大量的自动贴标签的培训数据。在传输学习时,我们展示机器人的能力,以探测最近学到的物体,评估不同机器学习模型和学习程序的影响,以及所涉及的培训数据的数量。我们的多式方法证明,通过相互交织的视线(引导机器人注意一个感兴趣的对象)和自然的视觉方法,我们用一种高效和自然的方法, 来教我们用来探测新的数据区域,我们用来探测现有的数据。</s>