This paper addresses the problem of mobile robot manipulation using object detection. Our approach uses detection and control as complimentary functions that learn from real-world interactions. We develop an end-to-end manipulation method based solely on detection and introduce Task-focused Few-shot Object Detection (TFOD) to learn new objects and settings. Our robot collects its own training data and automatically determines when to retrain detection to improve performance across various subtasks (e.g., grasping). Notably, detection training is low-cost, and our robot learns to manipulate new objects using as few as four clicks of annotation. In physical experiments, our robot learns visual control from a single click of annotation and a novel update formulation, manipulates new objects in clutter and other mobile settings, and achieves state-of-the-art results on an existing visual servo control and depth estimation benchmark. Finally, we develop a TFOD Benchmark to support future object detection research for robotics: https://github.com/griffbr/tfod.
翻译:本文探讨使用物体探测进行移动机器人操纵的问题。 我们的方法使用探测和控制作为从现实世界互动中学习的辅助功能。 我们开发了一种端到端操纵方法, 仅以探测为基础, 并引入了以任务为重点的小射点物体探测( TFOD) 来学习新的对象和设置。 我们的机器人收集了自己的培训数据, 并自动决定何时再培训检测, 以提高各子任务( 如, 抓取) 的性能。 值得注意的是, 检测培训成本低, 我们的机器人学会了以四下方的注解数来操作新对象。 在物理实验中, 我们的机器人从一次性点击的注解和新的更新配方中学习视觉控制, 操作布特和其他移动设置中的新对象, 并在现有的视觉瑟沃控制和深度估计基准中实现最新的结果。 最后, 我们开发了一个TFOD基准, 以支持未来机器人的物体探测研究: https://github.com/griffbr/tfod。