We present a method for fast training of vision based control policies on real robots. The key idea behind our method is to perform multi-task Reinforcement Learning with auxiliary tasks that differ not only in the reward to be optimized but also in the state-space in which they operate. In particular, we allow auxiliary task policies to utilize task features that are available only at training-time. This allows for fast learning of auxiliary policies, which subsequently generate good data for training the main, vision-based control policies. This method can be seen as an extension of the Scheduled Auxiliary Control (SAC-X) framework. We demonstrate the efficacy of our method by using both a simulated and real-world Ball-in-a-Cup game controlled by a robot arm. In simulation, our approach leads to significant learning speed-ups when compared to standard SAC-X. On the real robot we show that the task can be learned from-scratch, i.e., with no transfer from simulation and no imitation learning. Videos of our learned policies running on the real robot can be found at https://sites.google.com/view/rss-2019-sawyer-bic/.
翻译:我们提出了一个对真实机器人进行基于视觉的控制政策快速培训的方法。 我们的方法背后的关键理念是执行多任务强化学习,其辅助任务不仅在最佳奖励方面不同,而且在它们运作的州空间方面也不同。 特别是,我们允许辅助任务政策使用培训时才具备的任务特征。 这允许快速学习辅助政策,随后为培训主要基于视觉的控制政策生成良好数据。 这个方法可以被视为计划辅助控制(SAC-X)框架的延伸。 我们通过使用由机器人手臂控制的模拟和真实世界球在州- CUL-A-CUP游戏来展示我们方法的功效。 在模拟中,我们的方法导致在与标准SAC-X相比时大量学习加速。 在真正的机器人上,我们显示任务可以从Scracatch中学习,也就是说,从模拟中无法传输,不能模仿学习。 我们在真实机器人上学习的学习政策的视频可以在 https://site.gogle.com/view19/rss-20 https://sitesite.goyer/gole.