Visual domain randomization in simulated environments is a widely used method to transfer policies trained in simulation to real robots. However, domain randomization and augmentation hamper the training of a policy. As reinforcement learning struggles with a noisy training signal, this additional nuisance can drastically impede training. For difficult tasks it can even result in complete failure to learn. To overcome this problem we propose to pre-train a perception encoder that already provides an embedding invariant to the randomization. We demonstrate that this yields consistently improved results on a randomized version of DeepMind control suite tasks and a stacking environment on arbitrary backgrounds with zero-shot transfer to a physical robot.
翻译:模拟环境中的视觉域随机化是一种广泛使用的方法,用于将模拟训练的政策转让给真正的机器人。然而,域随机化和扩增妨碍了一项政策的培训。随着强化学习与噪音培训信号的争斗,这种额外的烦恼会极大地妨碍培训。对于困难的任务来说,这甚至会导致完全无法学习。为了克服这个问题,我们建议对已经为随机化提供内嵌变量的感知编码器进行预培训。我们证明,通过随机化的 DeepMind 控制套件任务和任意背景的堆叠环境,这在随机化的 DeepMind 控制套件任务和零光转换为物理机器人时,结果会不断改善。