The success of deep reinforcement learning (RL) and imitation learning (IL) in vision-based robotic manipulation typically hinges on the expense of large scale data collection. With simulation, data to train a policy can be collected efficiently at scale, but the visual gap between sim and real makes deployment in the real world difficult. We introduce RetinaGAN, a generative adversarial network (GAN) approach to adapt simulated images to realistic ones with object-detection consistency. RetinaGAN is trained in an unsupervised manner without task loss dependencies, and preserves general object structure and texture in adapted images. We evaluate our method on three real world tasks: grasping, pushing, and door opening. RetinaGAN improves upon the performance of prior sim-to-real methods for RL-based object instance grasping and continues to be effective even in the limited data regime. When applied to a pushing task in a similar visual domain, RetinaGAN demonstrates transfer with no additional real data requirements. We also show our method bridges the visual gap for a novel door opening task using imitation learning in a new visual domain. Visit the project website at https://retinagan.github.io/
翻译:在基于视觉的机器人操作中,深强化学习(RL)和模仿学习(IL)的成功通常取决于大规模数据收集的成本。通过模拟,可以有效地收集用于培训一项政策的数据,但模拟可以大规模地收集数据,但模拟和真实之间的视觉差距使得在现实世界中的部署变得很困难。我们引入了REtinaGAN,即一种基因对抗网络(GAN)方法,将模拟图像改造为具有物体探测一致性的现实图像。RetinaGAN在不依赖任务损失的情况下以不受监督的方式接受培训,并保存了适应图像的一般对象结构和纹理。我们评估了三种真实世界任务的方法:抓取、推和开门。RetinaGAN改进了以前基于RL物体的模拟到真实方法的性操作,甚至在有限的数据系统中仍然有效。当应用到类似的视觉域的推力任务时,RetinGAN演示了没有额外真实数据要求的传输。我们还展示了我们的方法,用新的视觉域域域学习来弥补新式的门打开任务的视觉差距。访问网站 http://regres 。