Self-supervised grasp learning, i.e., learning to grasp by trial and error, has made great progress. However, it is still time-consuming to train such a model and also a challenge to apply it in practice. This work presents an accelerating method of robotic grasp learning via pretraining with coarse affordance maps of objects to be grasped based on a quite small dataset. A model generated through pre-training is harnessed as an initialization policy to warmly start grasp learning so as to guide a robot to capture more effective rewards at the beginning of training. An object in its coarse affordance map is annotated with a single key point and thereby, the burden of labeling is greatly alleviated. Extensive experiments in simulation and on a real robot are conducted to evaluate the proposed method. The simulation results show that it can significantly accelerate grasp learning by nearly three times over a vanilla Deep Q-Network -based method. Its test on a real UR3 robot shows that it reaches a grasp success rate of 89.5% via only 500 times of grasp tries within about two hours, which is four times faster than its competitor. In addition, it enjoys an outstanding generalization ability to grasp prior-unseen novel objects. It outperforms some existing methods and has the potential to directly apply to a robot for real-world grasp learning tasks.
翻译:自我监督的握手学习,即通过试验和错误学习,取得了巨大的进步。然而,培训这样一个模型仍然耗时费时,而且实际应用也是一项挑战。这项工作展示了一种加速的机器人抓住学习方法,即先用粗粗的、价格低廉的地图在相当小的数据集基础上掌握物体的预培训前学习。通过培训前产生的模型被作为一种初始化政策加以利用,以便开始热情地掌握学习,从而引导机器人在培训开始时获取更有效的奖励。其粗粗开价地图上的一个对象是带有单一关键点的附加说明,因此标签的负担大大减轻。模拟和真正的机器人上的广泛实验是为了评价拟议方法。模拟结果表明,它能够大大加快对香草深Q-Network-基于的方法的学习,几乎三次。它用一个真正的 UR3机器人测试显示,它在大约两个小时内通过500次的握手试验获得89.5 %的成功率,这比其真实目标要快四倍。此外,它可以直接利用一个潜在的普遍学习方法来学习。此外,它可以直接利用一个杰出的一般方法来学习一个先进的方法。