This paper addresses the problem of mobile robot manipulation of novel objects via detection. Our approach uses vision and control as complementary functions that learn from real-world tasks. We develop a manipulation method based solely on detection then introduce task-focused few-shot object detection to learn new objects and settings. The current paradigm for few-shot object detection uses existing annotated examples. In contrast, we extend this paradigm by using active data collection and annotation selection that improves performance for specific downstream tasks (e.g., depth estimation and grasping). In experiments for our interactive approach to few-shot learning, we train a robot to manipulate objects directly from detection (ClickBot). ClickBot learns visual servo control from a single click of annotation, grasps novel objects in clutter and other settings, and achieves state-of-the-art results on an existing visual servo control and depth estimation benchmark. Finally, we establish a task-focused few-shot object detection benchmark to support future research: https://github.com/griffbr/TFOD.
翻译:本文探讨通过探测移动机器人操纵新物体的问题。 我们的方法使用视觉和控制作为从现实世界任务中学习的补充功能。 我们开发了一种完全基于检测的操纵方法,然后引入了以任务为重点的微小物体探测来学习新的物体和设置。 目前微小物体探测的范例使用了一个附加说明的例子。 相反,我们通过使用积极的数据收集和注释选择来扩展这一范例,从而改进具体下游任务(例如深度估测和捕捉)的性能。 在实验中,我们用交互式方法进行微小的学习,我们训练了一台机器人直接从检测中操作物体(ClickBot)。 点击Bot从一次性的注解中学习视觉瑟沃控制,捕捉布罗特和其他环境中的新物体,并在现有的视觉传感器控制和深度估测基准上取得最新结果。 最后,我们建立了一个以任务为焦点的微小物体探测基准,以支持未来的研究: https://github.com/griffbr/TFOD。