Grasping small objects surrounded by unstable or non-rigid material plays a crucial role in applications such as surgery, harvesting, construction, disaster recovery, and assisted feeding. This task is especially difficult when fine manipulation is required in the presence of sensor noise and perception errors; this inevitably triggers dynamic motion, which is challenging to model precisely. Circumventing the difficulty to build accurate models for contacts and dynamics, data-driven methods like reinforcement learning (RL) can optimize task performance via trial and error. Applying these methods to real robots, however, has been hindered by factors such as prohibitively high sample complexity or the high training infrastructure cost for providing resets on hardware. This work presents CherryBot, an RL system that uses chopsticks for fine manipulation that surpasses human reactiveness for some dynamic grasping tasks. By carefully designing the training paradigm and algorithm, we study how to make a real-world robot learning system sample efficient and general while reducing the human effort required for supervision. Our system shows continual improvement through 30 minutes of real-world interaction: through reactive retry, it achieves an almost 100% success rate on the demanding task of using chopsticks to grasp small objects swinging in the air. We demonstrate the reactiveness, robustness and generalizability of CherryBot to varying object shapes and dynamics (e.g., external disturbances like wind and human perturbations). Videos are available at https://goodcherrybot.github.io/.
翻译:由不稳定或非硬质材料环绕的小物体在手术、收割、建设、建设、灾难恢复和辅助进食等应用中发挥着关键作用。当在传感器噪音和感知错误的情况下需要精细操作时,这项任务就特别困难。这不可避免地触发动态运动,这是精确模型的挑战。在难于建立准确的接触和动态模型的情况下,数据驱动方法,如强化学习(RL),可以通过试验和错误优化任务性能。将这些方法应用到真正的机器人中,但受到一些因素的阻碍,例如,在提供硬件Reset的试样复杂性极高或培训基础设施成本高得惊人。这项工作展示了樱桃博特,这是一个使用棍子进行精细操作的精细操作系统,超过了某些动态掌握的任务的人类反应。通过仔细设计培训模式和算法,我们研究如何使真实世界的机器人学习系统样本高效和笼统,同时减少监督所需的人力力。我们的系统类似现实世界互动的30分钟持续改进:通过反应性再研究,它能达到近100%的成功率,在使用坚硬性软的物体上,在使用固定的弹性机能稳定度上展示。</s>