Visual servoing enables robotic systems to perform accurate closed-loop control, which is required in many applications. However, existing methods either require precise calibration of the robot kinematic model and cameras or use neural architectures that require large amounts of data to train. In this work, we present a method for unsupervised learning of visual servoing that does not require any prior calibration and is extremely data-efficient. Our key insight is that visual servoing does not depend on identifying the veridical kinematic and camera parameters, but instead only on an accurate generative model of image feature observations from the joint positions of the robot. We demonstrate that with our model architecture and learning algorithm, we can consistently learn accurate models from less than 50 training samples (which amounts to less than 1 min of unsupervised data collection), and that such data-efficient learning is not possible with standard neural architectures. Further, we show that by using the generative model in the loop and learning online, we can enable a robotic system to recover from calibration errors and to detect and quickly adapt to possibly unexpected changes in the robot-camera system (e.g. bumped camera, new objects).
翻译:视觉透视使机器人系统能够进行精确的闭路控制,这是许多应用中所要求的。然而,现有的方法要么需要精确校准机器人运动模型和相机,要么需要精确校准需要大量数据才能培训的神经结构。在这项工作中,我们提出了一个不受监督地学习视觉透镜的方法,不需要事先校准,而且数据效率极高。我们的关键洞察力是,视觉透镜并不取决于辨别天体运动和相机参数,而是仅仅依靠机器人联合位置图像特征观测的精确基因化模型。我们证明,通过模型结构和学习算法,我们可以持续从不到50个培训样本(相当于不到1分钟的未监督数据收集)中学习准确模型,而这种数据高效的学习在标准的神经结构中是不可能的。此外,我们通过在循环和在线学习中使用基因化模型,我们可以使机器人系统能够从校准错误中恢复并检测和迅速适应机器人相机系统可能出乎意料的变化(e.g.camblod)。