This paper introduces a challenging object grasping task and proposes a self-supervised learning approach. The goal of the task is to grasp an object which is not feasible with a single parallel gripper, but only with harnessing environment fixtures (e.g., walls, furniture, heavy objects). This Slide-to-Wall grasping task assumes no prior knowledge except the partial observation of a target object. Hence the robot should learn an effective policy given a scene observation that may include the target object, environmental fixtures, and any other disturbing objects. We formulate the problem as visual affordances learning for which Target-Oriented Deep Q-Network (TO-DQN) is proposed to efficiently learn visual affordance maps (i.e., Q-maps) to guide robot actions. Since the training necessitates robot's exploration and collision with the fixtures, TO-DQN is first trained safely with a simulated robot manipulator and then applied to a real robot. We empirically show that TO-DQN can learn to solve the task in different environment settings in simulation and outperforms a standard and a variant of Deep Q-Network (DQN) in terms of training efficiency and robustness. The testing performance in both simulation and real-robot experiments shows that the policy trained by TO-DQN achieves comparable performance to humans.
翻译:本文引入了具有挑战性的物体抓取任务, 并提出了自我监督的学习方法 。 任务的目标是掌握一个不可行的对象, 使用单一的平行抓取器, 但只有使用环境固定装置( 如墙壁、 家具、 重物) 才是不可行的对象 。 这个幻灯片到瓦的抓捕任务假定除了部分观测目标对象之外没有事先的知识。 因此, 机器人应该学习有效的政策, 给一个包括目标对象、 环境固定装置 和任何其他扰动对象的场景观察。 我们将问题描述为视觉提供者学习, 用于学习以目标为方向的深网络( TO- DQN), 以便高效地学习视觉配置地图( 即 Q- maps) 来指导机器人的行动。 由于培训需要机器人探索和与目标对象部分观测, 因此, 数字QN 首先是用模拟的机器人操纵器进行安全训练, 然后应用到真正的机器人。 我们从经验上显示, TO- DQN 可以学习在不同的环境环境中解决任务, 模拟和超越了深度Q- 测试中经过训练后, 性能测试, 性测试 性能 测试 。 (Q- tractal- tractal- tractal- tractal- tractal- ) 测试, 和 测试 的实性测试 。