This paper deals with the reality gap from a novel perspective, targeting transferring Deep Reinforcement Learning (DRL) policies learned in simulated environments to the real-world domain for visual control tasks. Instead of adopting the common solutions to the problem by increasing the visual fidelity of synthetic images output from simulators during the training phase, this paper seeks to tackle the problem by translating the real-world image streams back to the synthetic domain during the deployment phase, to make the robot feel at home. We propose this as a lightweight, flexible, and efficient solution for visual control, as 1) no extra transfer steps are required during the expensive training of DRL agents in simulation; 2) the trained DRL agents will not be constrained to being deployable in only one specific real-world environment; 3) the policy training and the transfer operations are decoupled, and can be conducted in parallel. Besides this, we propose a conceptually simple yet very effective shift loss to constrain the consistency between subsequent frames, eliminating the need for optical flow. We validate the shift loss for artistic style transfer for videos and domain adaptation, and validate our visual control approach in real-world robot experiments. A video of our results is available at: https://goo.gl/b1xz1s.
翻译:本文从新角度论述现实差距问题,目标是将模拟环境中学到的深强化学习(DRL)政策转移到真实世界域进行视觉控制任务。本文不是通过在培训阶段提高模拟器合成图像输出的视觉真实性,而是通过提高模拟器模拟器合成图像输出在培训阶段的视觉真实性来解决这一问题,而是试图通过在部署阶段将真实世界图像流转换回合成域来解决这一问题,以使机器人有家的感觉。我们提议这是视觉控制的一个轻巧、灵活和有效的解决方案,因为1 在模拟中培训DRL代理器的费用昂贵期间不需要额外的转移步骤;2 训练有素的DRL代理器将不受限制,只能在一个特定的真实世界环境中部署;3 政策培训和传输操作可以同时进行。除此之外,我们提议从概念上简单而有效的转移损失,以限制随后框架的一致性,消除光流的需要。我们验证了视频和域域适应艺术风格转移的转移损失,并验证了我们真实世界机器人实验中的视觉控制方法。